امتحان 1435

advertisement
Kingdom of Saudi Arabia
Ministry of Higher Education
Majmaah University
Vice rectorate for Academic Affairs
Measurement & Assessments Administration
The model Answer of final examination for the Data Mining
Second semester 1434 / 1435 H
College:…Science in az Zulfi… )‫(نموذج إجابة‬
Program: CSI Dept.
Course Name: Data Mining
Course Code: CSI 449-Z
Section: 273
Date: 20-7-1435
Duration: two hours
Number of pages: 5
The student's name:
University ID:
Examination Guidelines
1- Type your name and university identification number clearly in the space provided.
2- Use blue or black pen in answer and pencil in drawing.
3- Books or notes, papers and publications are not allowed into the examination room.
4- Students are not allowed to get out from the examination room before passing 30
minutes from the beginning of test starting.
Learning Outcomes
The Knowledge
Skills
Interpersonal skills
Cognitive skills
and taking
responsibility
a
b
Communication,
information
technology and
numerical skills
Psychomotor
skills
d
e
c
Grades
Faculty member
Corrector 1
Dr. Weal Khedr.
/…………….
Review Committee
/…………….
Name
Signature
/…………….
/…………….
/…………….
Final grade...…../….......
/…………….
Learning
outcome
Question
/…………….
………a………..
1
/…………….
………a, b……..
2
/…………….
……b, c………..
3
/…………….
………c, d……..
4
/…………….
………………..
5
Corrector 2
/…………….
Kingdom of Saudi Arabia
Ministry of Higher Education
Majmaah University
Vice rectorate for Academic Affairs
Measurement & Assessments Administration
Question(1): The right answer of the followings is Bolded with underline?
1) The values of ----------- attribute are just different names and provide only enough
information to distinguish one object from another.
a. Ratio
b. Interval
c. Ordinal
d. Nominal
2) The values of a/an ------------ attribute provide enough information to order objects.
a. Ratio
b. Interval
c. Ordinal
d. Nominal
3) For ------------ attributes, the differences between values are meaningful, i.e., a unit
of measurement exists.
a. Ratio b. Interval c. Ordinal
d. Nominal
4) It is a type of data sets that is based on a sequence or a transactions of data
a.
Record
b. Graph
c. Ordered d. Data Matrix
5) Reduce amount of time and memory required by data mining algorithms
a. Data Reduction b. Data Mining c. Data aggregation d. Data matrix
6) It is the main technique employed for data selection.
a. Noise
b. Sampling
c. Clustering
d. Histogram
7) Combining two or more attributes (or objects) into a single attribute (or object)
a. Noise
b. Sampling
c. Aggregation
d. Histogram
8) It can be mapping Data to a New Space( Frequency Domain) .
a. Aggregation
b. Data Reduction c. Fourier transform
d. Sampling
9) It refers to modification of original values.
a. Aggregation
10)
b. Data selection
c. Noise
d. Clustering
Classify of records can be done by using a collection of -----------based classifier.
a. Rules
10
a
9
c
b. Clusters
8
c
7
c
c. Decision tree
6
b
5
a
4
c
d. Measure of Impurity
3
b
2
c
1
d
Kingdom of Saudi Arabia
Ministry of Higher Education
Majmaah University
Vice rectorate for Academic Affairs
Measurement & Assessments Administration
Question (2) Complete the followings?
(A) Define Data Mining ?
 Non-trivial extraction of implicit, previously unknown and potentially useful
information from data. or
 Exploration & analysis, by automatic or semi-automatic means, of large
quantities of data in order to discover meaningful patterns
(B) What are Data Mining Tasks
1. Classification [Predictive]
2. Clustering [Descriptive]
3. Association Rule Discovery [Descriptive]
4. Sequential Pattern Discovery [Descriptive]
5. Regression [Predictive]
6. Deviation Detection [Predictive]
(C) Define Data Classification ?
Find a model for class attribute as a function of the values of other attributes.
(D) Complete the following figure of a Classification model?
-Test Set
Training Set
Learning
Classifier
(E) What are Similarity Measures of Clustering ?
1. Euclidean Distance
2. Minkowski Distance
3. Mahalanobis Distance
(F) What are Challenges of Data Mining?
1. Scalability
2. Dimensionality
3. Complex and Heterogeneous Data
4. Data Quality
5. Data Ownership and Distribution
6. Privacy Preservation
7. Streaming Data
Model
Kingdom of Saudi Arabia
Ministry of Higher Education
Majmaah University
Vice rectorate for Academic Affairs
Measurement & Assessments Administration
Question (3) :
1- Draw the Decision tree to classify records based on class attribute (class)?
2- Find the class of tested set?
3- Calculate the Measure of Impurity by using GINI of Refund node?
Answer
Refund
Yes
NO
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
> 80K
YES
NO
Tested set:
NO
1
2
3
NO
Single
Yes Married
No
Single
90
72
95
Yes
No
Yes
GINI (t )  1   [ p( j | t )]2
j
Gini Index for a given node Refund=
Refund
Yes
C0
C1
P(C0) = 4/7 = 0
4
3
P(C1) = 3/7 = 1
2
2
Gini = 1 – P(C0) – P(C1) = 1 – 0.1066 – 0.18367 = 0.7097
P(C0) = 3/3 = 0
P(C1) = 0/3 = 0
2
2
Gini = 1 – P(C1) – P(C2) = 1 – 1 – 0 = 0
NO
C0
C1
3
0
Kingdom of Saudi Arabia
Ministry of Higher Education
Majmaah University
Vice rectorate for Academic Affairs
Measurement & Assessments Administration
Question (4) :
A) How to determine/find the Best Split in Tree Induction classification technique?
1. Greedy approach: Nodes with homogeneous class distribution are
preferred.
2. Need a measure of node impurity.
B) What are Measure of Impurity of split?
1.
Gini Index
2. Entropy
3. Misclassification error
C) Construct a Rules-based Classifier of Question3?
R1: ( Refund = Yes)  NO
R2: (Refund = No)  (Status=Married)  NO
R3: (Refund = No))  (Status=Single/Divorce) (TaxInc < 80K)  No
R4: (Refund = No))  (Status=Single/Divorce) (TaxInc >= 80K)  Yes
Download