Kingdom of Saudi Arabia Ministry of Higher Education Majmaah University Vice rectorate for Academic Affairs Measurement & Assessments Administration The model Answer of final examination for the Data Mining Second semester 1434 / 1435 H College:…Science in az Zulfi… )(نموذج إجابة Program: CSI Dept. Course Name: Data Mining Course Code: CSI 449-Z Section: 273 Date: 20-7-1435 Duration: two hours Number of pages: 5 The student's name: University ID: Examination Guidelines 1- Type your name and university identification number clearly in the space provided. 2- Use blue or black pen in answer and pencil in drawing. 3- Books or notes, papers and publications are not allowed into the examination room. 4- Students are not allowed to get out from the examination room before passing 30 minutes from the beginning of test starting. Learning Outcomes The Knowledge Skills Interpersonal skills Cognitive skills and taking responsibility a b Communication, information technology and numerical skills Psychomotor skills d e c Grades Faculty member Corrector 1 Dr. Weal Khedr. /……………. Review Committee /……………. Name Signature /……………. /……………. /……………. Final grade...…../…....... /……………. Learning outcome Question /……………. ………a……….. 1 /……………. ………a, b…….. 2 /……………. ……b, c……….. 3 /……………. ………c, d…….. 4 /……………. ……………….. 5 Corrector 2 /……………. Kingdom of Saudi Arabia Ministry of Higher Education Majmaah University Vice rectorate for Academic Affairs Measurement & Assessments Administration Question(1): The right answer of the followings is Bolded with underline? 1) The values of ----------- attribute are just different names and provide only enough information to distinguish one object from another. a. Ratio b. Interval c. Ordinal d. Nominal 2) The values of a/an ------------ attribute provide enough information to order objects. a. Ratio b. Interval c. Ordinal d. Nominal 3) For ------------ attributes, the differences between values are meaningful, i.e., a unit of measurement exists. a. Ratio b. Interval c. Ordinal d. Nominal 4) It is a type of data sets that is based on a sequence or a transactions of data a. Record b. Graph c. Ordered d. Data Matrix 5) Reduce amount of time and memory required by data mining algorithms a. Data Reduction b. Data Mining c. Data aggregation d. Data matrix 6) It is the main technique employed for data selection. a. Noise b. Sampling c. Clustering d. Histogram 7) Combining two or more attributes (or objects) into a single attribute (or object) a. Noise b. Sampling c. Aggregation d. Histogram 8) It can be mapping Data to a New Space( Frequency Domain) . a. Aggregation b. Data Reduction c. Fourier transform d. Sampling 9) It refers to modification of original values. a. Aggregation 10) b. Data selection c. Noise d. Clustering Classify of records can be done by using a collection of -----------based classifier. a. Rules 10 a 9 c b. Clusters 8 c 7 c c. Decision tree 6 b 5 a 4 c d. Measure of Impurity 3 b 2 c 1 d Kingdom of Saudi Arabia Ministry of Higher Education Majmaah University Vice rectorate for Academic Affairs Measurement & Assessments Administration Question (2) Complete the followings? (A) Define Data Mining ? Non-trivial extraction of implicit, previously unknown and potentially useful information from data. or Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns (B) What are Data Mining Tasks 1. Classification [Predictive] 2. Clustering [Descriptive] 3. Association Rule Discovery [Descriptive] 4. Sequential Pattern Discovery [Descriptive] 5. Regression [Predictive] 6. Deviation Detection [Predictive] (C) Define Data Classification ? Find a model for class attribute as a function of the values of other attributes. (D) Complete the following figure of a Classification model? -Test Set Training Set Learning Classifier (E) What are Similarity Measures of Clustering ? 1. Euclidean Distance 2. Minkowski Distance 3. Mahalanobis Distance (F) What are Challenges of Data Mining? 1. Scalability 2. Dimensionality 3. Complex and Heterogeneous Data 4. Data Quality 5. Data Ownership and Distribution 6. Privacy Preservation 7. Streaming Data Model Kingdom of Saudi Arabia Ministry of Higher Education Majmaah University Vice rectorate for Academic Affairs Measurement & Assessments Administration Question (3) : 1- Draw the Decision tree to classify records based on class attribute (class)? 2- Find the class of tested set? 3- Calculate the Measure of Impurity by using GINI of Refund node? Answer Refund Yes NO NO MarSt Married Single, Divorced TaxInc < 80K > 80K YES NO Tested set: NO 1 2 3 NO Single Yes Married No Single 90 72 95 Yes No Yes GINI (t ) 1 [ p( j | t )]2 j Gini Index for a given node Refund= Refund Yes C0 C1 P(C0) = 4/7 = 0 4 3 P(C1) = 3/7 = 1 2 2 Gini = 1 – P(C0) – P(C1) = 1 – 0.1066 – 0.18367 = 0.7097 P(C0) = 3/3 = 0 P(C1) = 0/3 = 0 2 2 Gini = 1 – P(C1) – P(C2) = 1 – 1 – 0 = 0 NO C0 C1 3 0 Kingdom of Saudi Arabia Ministry of Higher Education Majmaah University Vice rectorate for Academic Affairs Measurement & Assessments Administration Question (4) : A) How to determine/find the Best Split in Tree Induction classification technique? 1. Greedy approach: Nodes with homogeneous class distribution are preferred. 2. Need a measure of node impurity. B) What are Measure of Impurity of split? 1. Gini Index 2. Entropy 3. Misclassification error C) Construct a Rules-based Classifier of Question3? R1: ( Refund = Yes) NO R2: (Refund = No) (Status=Married) NO R3: (Refund = No)) (Status=Single/Divorce) (TaxInc < 80K) No R4: (Refund = No)) (Status=Single/Divorce) (TaxInc >= 80K) Yes