أسئلة 2 (عامة)

advertisement
Q1. For each data mining task below, indicate whether it is predictive modeling,
association analysis, cluster analysis or anomaly detection.
(5 points)
(a) Deciding whether to issue a loan to an applicant, based on demographic and financial
data (with reference to a database of similar data on prior customers).
Predictive
Answer:
(b) In an online bookstore, making recommendations to customers concerning additional
items to buy, based on the buying patterns in prior transactions.
Predictive / Association
Answer:
(c) Identifying a network data packet as dangerous (virus, hacker attack), based on
comparison to other packets whose threat status is known.
Anomaly Detection
Answer:
(d) Identifying segments of similar customers.
Cluster
Answer:
(e) Printing of custom discount coupons at the conclusion of a grocery store checkout,
based on what you just bought and what others have bought previously.
Association / Predictive
Answer:
Q2. Classify the following attributes as binary, discrete or continuous. Also classify
them as qualitative (nominal or ordinal) or quantitative (interval or ratio).
(10 points)
Income
:
=>Continuous – quantitative - ratio
Property area : =>Continuous – quantitative - ratio
Ownership of boat (yes/no) : =>Binary – qualitative - nominal
Days of the week (coded Mon, Tue, Wed,….) => Discrete – qualitative - nominal
Number of beds in a hospital : => Discrete – quantitative - ratio
Final grades in an MBA class (A+, A, …) : => Discrete – qualitative - ordinal
1
Petal length : => Continuous – quantitative - ratio
Iris flower type (virginica, etc…) :=> Discrete – qualitative - nominal
Shirt size (XS, S, M, L, XL) : => Discrete – qualitative - ordinal
Frequent flier miles accumulated : => Continuous – quantitative - ratio
Q3. TRUE/FALSE questions
___F_____Association Rule Mining is equivalent to Classification because in both cases
rules are derived
____T____Dealing with high dimensionality is often a challenge in data mining
___T_____The median and the mean are the same for a population that is normally
distributed
___T_____Outliers are data objects with characteristics that are considerably different
than most of the other data objects in the data set
(TRUE/FALSE – cont.)
__F______Euclidean distance is the only similarity measure that can be used in cluster
analysis
__T______Scatterplots are good data visualization tools.
___F_____Cluster analysis always provides a scientific, clear-cut answer to a
segmentation problem.
Q4) MULTIPLE CHOICE QUESTIONS: (5 points)
5.1 Similarity between data points when deciding on whether or not they belong to the
same cluster, is measured by:
a.
b.
c.
d.
Distance measure
Whether or not they belong to the same prediction class
(a) and (b)
(a) or (b)
2
Download