Uploaded by Zaid Halhouli

toaz.info-midterm-review-solution-pr 4aef18f9ca3d1b2a0323bfbd987fd198

advertisement
Data Mining
TCSS555 Autumn 2007
Midterm Review
1. How is a data warehouse different from a database? How are they similar?
2. In real-world data, tuples with missing values for some attributes are a common
occurrence. Describe various methods for handling this problem.
1
Data Mining
TCSS555 Autumn 2007
3. Discuss issues to consider during data integration.
4. Briefly outline the major steps of decision tree classification.
2
Data Mining
TCSS555 Autumn 2007
5. Suppose that the data for analysis include the attribute age. The age values for the data
tuples are (in increasing order): 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33,
35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Answer the following:
a) Use min-max normalization to transform the value 35 for age onto the range [0.0, 1.0].
b) Use z-score normalization to transform the value 35 for age, where the standard
deviation of age is 12.94 years.
c) Use normalization by decimal scaling to transform the value 35 for age.
d) Comment on which method you would prefer to use for the given data, giving reasons
as to why.
3
Data Mining
TCSS555 Autumn 2007
6. Consider the association rule below, which was mined from the student database at Big-
University:
(Rule 1)
Suppose that the number of students at the university (that is the number of task-relevant data
tuples) is 5000, that 70% of the students are majoring in science, that 64% of the students are
registered in programs leading to undergraduate degrees, and that 56% of the undergraduates at the
university major in science.
a) Compute the confidence and support for the rule (Rule 1).
b) Consider (Rule 2) below.
(Rule 2)
Suppose that 30% of science students are majoring in biology. Would you consider (Rule 2) to
be novel with respect to (Rule 1)? Explain.
4
Data Mining
TCSS555 Autumn 2007
5
Data Mining
TCSS555 Autumn 2007
7. a) Find all the frequent itemsets using APriori and FP-Growth, respectively, from the
following transaction database:
b) List all of the strong association rules (with support s and confidence c) matching the
following metarule, where X is a variable representing customers, and itemi denotes
variables representing items (e.g. “A”, “B”, etc.):
6
Data Mining
TCSS555 Autumn 2007
7
Download