# أسئلة 2 (عامة)

```Q1. For each data mining task below, indicate whether it is predictive modeling,
association analysis, cluster analysis or anomaly detection.
(5 points)
(a) Deciding whether to issue a loan to an applicant, based on demographic and financial
data (with reference to a database of similar data on prior customers).
Predictive
(b) In an online bookstore, making recommendations to customers concerning additional
items to buy, based on the buying patterns in prior transactions.
Predictive / Association
(c) Identifying a network data packet as dangerous (virus, hacker attack), based on
comparison to other packets whose threat status is known.
Anomaly Detection
(d) Identifying segments of similar customers.
Cluster
(e) Printing of custom discount coupons at the conclusion of a grocery store checkout,
based on what you just bought and what others have bought previously.
Association / Predictive
Q2. Classify the following attributes as binary, discrete or continuous. Also classify
them as qualitative (nominal or ordinal) or quantitative (interval or ratio).
(10 points)
Income
:
=&gt;Continuous – quantitative - ratio
Property area : =&gt;Continuous – quantitative - ratio
Ownership of boat (yes/no) : =&gt;Binary – qualitative - nominal
Days of the week (coded Mon, Tue, Wed,….) =&gt; Discrete – qualitative - nominal
Number of beds in a hospital : =&gt; Discrete – quantitative - ratio
Final grades in an MBA class (A+, A, …) : =&gt; Discrete – qualitative - ordinal
1
Petal length : =&gt; Continuous – quantitative - ratio
Iris flower type (virginica, etc…) :=&gt; Discrete – qualitative - nominal
Shirt size (XS, S, M, L, XL) : =&gt; Discrete – qualitative - ordinal
Frequent flier miles accumulated : =&gt; Continuous – quantitative - ratio
Q3. TRUE/FALSE questions
___F_____Association Rule Mining is equivalent to Classification because in both cases
rules are derived
____T____Dealing with high dimensionality is often a challenge in data mining
___T_____The median and the mean are the same for a population that is normally
distributed
___T_____Outliers are data objects with characteristics that are considerably different
than most of the other data objects in the data set
(TRUE/FALSE – cont.)
__F______Euclidean distance is the only similarity measure that can be used in cluster
analysis
__T______Scatterplots are good data visualization tools.
___F_____Cluster analysis always provides a scientific, clear-cut answer to a
segmentation problem.
Q4) MULTIPLE CHOICE QUESTIONS: (5 points)
5.1 Similarity between data points when deciding on whether or not they belong to the
same cluster, is measured by:
a.
b.
c.
d.
Distance measure
Whether or not they belong to the same prediction class
(a) and (b)
(a) or (b)
2
```