a d count a. How many nonempty cuboids will a full data...

advertisement
1. Assume a base cuboid of 10 dimensions contains only three base cells: (1) (a1, d2,
d3, d4, . . . , d9, d10), (2) (d1, b2, d3, d4, . . . , d9, d10), and (3) (d1, d2, c3, d4, . . . , d9,
d10), where a1 ≠ d1, , b2 ≠ d2. and c3 ≠ d3. The measure of the cub is count.
a. How many nonempty cuboids will a full data cube contain?
b. How many nonempty aggregate (i.e., nonbase) cells will a full cube
contain?
c. How many nonempty aggregate cells will an iceberg cube contain if the
condition of the iceberg cube is “count ≥ 2”?
d. A cell c, is a closed celll if there exists no cell, d, such that d is a
specialization of cell c (i.e., d is obtained by replacing a * in c by a non-*
value). A closed cube is a data cube consisting of only closed cell. How
many closed cells are in the full cube?
2. Suppose a data cube C, has D dimensions, and the base cuboid contains k
dinstinct tuples.
a. Present a formula to calculate the minimum number of cells that the cube
C, may contain.
b. Present a formula to calculate the maximum number of cells that the cube
C, may contain.
c. Answer parts (a) and (b) above as if the count in each cube cell must be no
less than a treshold v.
d. Answer parts (a) and (b) above as if only closed cells are considered (with
the minimum count treshold, v).
3. A flight data warehouse for travel agent consists of six dimensions: traveler,
departure (city), departure_time, arrival, arrival_time, and flight; and two
measures: count, and avg_fare, where avg_fare stores the concrete fare at the
lowest level but avarage fare at other levels.
a. Suppose the cube is fully meterialized. Starting with the base cuboid
[traveler, departure (city), departure_time, arrival, arrival_time, and
flight], what specific OLAP operations (e.g., roll-up flight to airline)
should one perform in order to list the avarage fare per month for each
business traveler who flies American Airlines (AA) from L.A in the year
2004.
b. Suppose we want to compute a data cube where the condition is that the
minimum number of records is 10 and the avarage fare is over $500.
Outline an efficient cube computation method (based on common sense
abount flight data distribution)
4. Suppose that a base cuboid has three dimensions, A, B, C, with the following
number of cells: |A| = 1.000.000, |B| = 100 and |C| = 1000. Suppose that each
dimension is evenly partitioned into portions for chucking.
a. Assuming each dimension has only one level, draw the complete lattice of
the cube.
b. If each cube cell stores one measure with 4 bytes, what is the total size of
the computed cube if the cube is dense?
c. State the order for computing the chuncks in the cube that requires the
least amount of space, and compute the total amount of main memory
space required for computing the 2-D planes.
5. Consider the following multifeature cube query: Grouping by all subset of {item,
region, month}, find the minimum shelf life in 2004 for each group and the
fraction of the total sales due to tuples whose price is less than $100 and whose
shelf life is between 1.25 and 1.5 of the minimum shelf life.
a. Draw the multifeature cube graph for the query.
b. Express the query in extended SQL.
c. Is this a distributive multifeature cube? Why or why not?
6. Suppose that the following table is derived by attribute-oriented induction.
class
Programmer
DBA
birth_place
USA
Others
USA
Others
count
180
120
20
80
a. Transform the table into a crosstab showing the associated t-weights and dweights.
b. Map the class Programmer into a (bidirectional) quantitive descriptive rule,
for example;
 X, Programmer(X)  (birth_place(X) = “USA” 
[t : x%, d : y%] . . .  ( . . . ) [t : w%, d : z%]
Download