sec midterm

advertisement
Second Midterm 18-2-1437
1st semester 1436 / 1437H
In Datamining Course
College:…Science in az Zulfi………
Program: CSI Dept.
Course Name: Datamining
Course Code: CSI 449
Section:
Date: 28-12-1436
Number of pages: 4
The student's name:
University ID:
1
Q(1) Select the wright answer of the followings:
Q1: Choose the right Answer? [ 10 marks]
1.
Knowledge is referred to -----------------------------------------------a) Non-trivial extraction of implicit previously unknown and potentially useful information
from data
b) Set of columns in a database table that can be used to identify each record within this table
uniquely
c) Collection of interesting and useful patterns in a dataset
d) None of theses
2. Which of the following is not a data mining functionality?
a) Characterization and Discrimination
b) Classification and regression
c) Selection and interpretation
d) Clustering and Analysis
3.
a)
b)
c)
d)
Classification is------------------------------------------A subdivision of a set of examples into a number of classes
A measure of the accuracy, of the classification of a concept that is given by a certain
theory
The task of assigning a classification to a set of examples
None of these
4.
a)
b)
c)
d)
Cluster is----------------------------------------------Symbolic representation of facts or ideas from which information can potentially be
extracted
Group of similar objects that differ significantly from other objects
Operations on a database to transform or simplify data in order to prepare it for a machinelearning algorithm
None of these
5.
a)
b)
c)
d)
Data selection is--------------------------------------------The actual discovery phase of a knowledge discovery process
The stage of selecting the right data for a KDD process
A subject-oriented integrated time variant non-volatile collection of data in support of
management
None of these
6.
Classification task referred to---------------------------------a) A subdivision of a set of examples into a number of classes
2
b) A measure of the accuracy, of the classification of a concept that is given by a certain
theory
c) The task of assigning a classification to a set of examples
d) None of these
7.
a)
b)
c)
d)
Euclidean distance measure is----------------------------A stage of the data process in which new data is added to the existing selection
The process of finding a solution for a problem simply by enumerating all possible
solutions according to some pre-defined order and then testing them
The distance between two points as calculated using the || ||2 distance.
The distance between two points as calculated using the || ||n distance
8.
9.
a)
b)
c)
d)
Nationality, ethnicity, language, style, biological species, and zip codes are
examples of-------------Mass , length, duration, plane angle, energy and electric charge are examples of---Nominal
Ordinal
Ratio
Interval
10. Median,
a)
b)
c)
d)
percentiles, and rank correlation are operations can be computed on
Nominal data
Ordinal data
Ratio data
Interval data
11. Extract
meaningful numeric indices from the text, and, thus, make the information
contained in the text accessible to the various data mining (statistical and machine
learning) algorithms.
12. World
a)
b)
c)
d)
e)
Wide Web is data type of ----------------Graph data
Ordered data
Record data
Web Data
document
13. Noise
a)
b)
c)
d)
refers to ------------------------------------------------modification of feature values
modification of original values
an error in original values
a combining two or more attributes into a single attribute
3
14. ---------------
are data objects with characteristics that are considerably different than
most of the other data objects in the data set.
a) Outliers data
b) Missing Values
c) Duplicate Data
d) Data Preprocessing
15. Aggregation
is ------------------------------------------------a) a combining two or more attributes into a single attribute .
b) the main technique employed for data selection .
c) the mean average of aggregated data
16. Dimensionality
Reduction algorithm to -----------------------------------------a) Reduce amount of time and memory required by data mining algorithms and it may
help to eliminate irrelevant features or reduce noise.
b) Mapping Data to a New Space
c) Create new attributes that can capture the important information in a data set much
more efficiently than the original attributes.
17. Create
new attributes that can capture the important information in a data set much
more efficiently than the original attributes
a) Feature Creation
b) Feature Selection
c) Feature Extraction
d) All
18. Numerical measure of how alike two data objects are
a) Irrelevant
b) Dissimilarity
c) Similarity
d) All
19. Minkowski
Distance is a generalization of Euclidean Distance and it is equivalent to
Euclidean Distance when r is equal to
a) 0
b) 1
c) 2
d) infinity
20. Similarity
Between Binary Vectors can be measured by
a) Minkowski Distance
b) Jaccard Coefficients
c)Euclidean Distance d) Mean
4
Q2: Complete the followings? [ 5 marks]
A) What are data quality problems?
123B) What are Data Preprocessing
?
1234-
C) Complete the following table that represents Similarity and dissimilarity for simple
attributes?
5
Q3) Compute the Distance between the following data types? [ 5 marks]
A) P1 ( 1,-2,3) and P2( 1, -1, 2 )
B) S1=( Fail , Good , Good , Better) and S2= ( Fail ,Good , Better , Good)
Where : The range data type is { fail, pass, good , V. Good, better }
C) p = 1 0 0 1 0 0 1 0 0 1
and
q= 1100101001
6
Download