Answer of Sec.Midterm

advertisement
‫وزارة الـتـعـلـيـم الـعـالـي‬
‫جـامـعـة الـمـجـمـعـة‬
‫كلية العلوم بالزلفي‬
‫قسم علوم الحاسب و المعلومات‬
Ministry of Higher Education
Majmaah University
College Of Science at Az-Zulfi
Dept .of Computer Science & Information
) ‫(برنامج تجسير الحاسب آلي‬
Second Midterm Exam of Data Mining
/‫الرقم الجامعي‬
/ ‫االسم‬
Q1: Choose the right Answer
1- The type of attributes zip codes and employee ID numbers is -----a------- but the type of
attributes hardness of minerals and street numbers is --------d-----------.
2- The type of attributes calendar dates, temperature is ----------b---------------- but the type
of attributes age, mass, length, electrical current is -------------c-----------.
a) Nominal
b) Interval
c) Ratio
d) Ordinal
3- Each document Data is represented by component (attribute) contains times of items in
a ---------a------- form.
a) Vector
b) Matrix
c) Record
d) Transaction
4- Genomic sequence data is type of ------b-----a) Graph data b) ordered data
c) Record data
d) Chemical Data
5- The Noise and outliers , missing values and duplicate data are problems of----d---a) Processing data b) Mapping Data
c) Evaluating data
d) Data quality
6- ---------b-------- refers to modification of original values.
a) Missing Values
b) Noise
c) Transform
d) Selection
7- -------a---------is a combining two or more attributes into a single attribute .
8- -------b------- is the main technique employed for data selection .
a)Aggregation
b) Sampling
c) Duplicate
d) Transform
9- Reduce amount of time and memory required by data mining algorithms and it may
help to eliminate irrelevant features or reduce noise.
a) Transaction Data b) Handling missing
c) Dimensionality Reduction
d) Data processing
1
10Cluster is (a)
a) Group of similar objects that differ significantly from other objects
b) Operations on a database to transform or simplify data in order to prepare it for a
machine-learning algorithm
c) Symbolic representation of facts or ideas from which information can potentially
be extracted
d) None of these
11-
Classification is ----------a--------
12Classification task referred to ------c-------a) A subdivision of a set of examples into a number of classes
b) A measure of the accuracy, of the classification of a concept that is given by a
certain theory
c) The task of assigning a classification to a set of examples
d) None of these
13Euclidean distance measure is
a) A stage of the Knowledge Discovery process in which new data is added to the
existing selection.
b) The process of finding a solution for a problem simply by enumerating all possible
solutions according to some pre-defined order and then testing them.
c) The distance between two points as calculated.
d) None of these
14Dimensionality Reduction Techniques such as
a) Principle Component Analysis
b) Sampling
c) Aggregation
15Mapping Data to a New Space (Frequency domain)
a) Fourier transform
b) Wavelet transform c) Entropy
16-
Numerical measure of how alike two data objects are
a) Irrelevant
b) Dissimilarity
c) Similarity
d)All
d) a and b
d) All
17Create new attributes that can capture the important information in a
data set much more efficiently than the original attributes
a) Feature Creation
b) Feature Selection
c) PCA
d) All
2
18Simplest approach is to divide region into a number of rectangular cells of
equal volume and define density as # of points the cell contains
a) Euclidean Distance b) Probability density
c) Euclidean density
d) None
19Minkowski Distance is a generalization of Euclidean Distance and it is
equivalent to Euclidean Distance when r is equal to
a) 0
b) 1
c) 2
d) infinity
20Similarity Between Binary Vectors can be measured by
a) Minkowski Distance
b) Jaccard Coefficients
c)Euclidean Distance d)All
///////////////////////////////////////////////////////////////////////////////////////////////////////
Q2: A) What are types of Data sets and data quality problems?
1. Graph data
2. Ordered data
3. Record data
B) Complete the following table that represents Similarity and dissimilarity for simple
attributes?
3
Q3) a) Compute the Euclidean Distance between each two points?
Euclidean Distance :-
dist 
point
p1
p2
p3
p4
n
2
(
p

q
)
 k k
k 1
dist( p1,p2) = ( ( P1x -P2x )2 + ( P1y - P2y )2 )0.5
= ( ( 0 -2 )2 + ( 2 - 0 )2 )0.5
= ( 4 + 4 )0.5= ( 8 )0.5= 2.83
And so on…….
B)Compute the Similarity Between Binary Vectors p, q:
p= 1000000000
and q = 0 0 0 0 0 0 1 0 0 1
Answer
p= 1 0 0 0 0 0 0 0 0 0
q= 0 0 0 0 0 0 1 0 0 1
M01 = 2 (the number of attributes where p was 0 and q was 1)
M10 = 1 (the number of attributes where p was 1 and q was 0)
M00 = 7 (the number of attributes where p was 0 and q was 0)
M11 = 0 (the number of attributes where p was 1 and q was 1)
SMC = (M11 + M00)/(M01 + M10 + M11 + M00) = (0+7) / (2+1+0+7) = 0.7
J = (M11) / (M01 + M10 + M11) = 0 / (2 + 1 + 0) = 0
4
x
0
2
3
5
y
2
0
1
1
Download