وزارة الـتـعـلـيـم الـعـالـي جـامـعـة الـمـجـمـعـة كلية العلوم بالزلفي قسم علوم الحاسب و المعلومات Ministry of Higher Education Majmaah University College Of Science at Az-Zulfi Dept .of Computer Science & Information ) (برنامج تجسير الحاسب آلي Second Midterm Exam of Data Mining /الرقم الجامعي / االسم Q1: Choose the right Answer 1- The type of attributes zip codes and employee ID numbers is -----a------- but the type of attributes hardness of minerals and street numbers is --------d-----------. 2- The type of attributes calendar dates, temperature is ----------b---------------- but the type of attributes age, mass, length, electrical current is -------------c-----------. a) Nominal b) Interval c) Ratio d) Ordinal 3- Each document Data is represented by component (attribute) contains times of items in a ---------a------- form. a) Vector b) Matrix c) Record d) Transaction 4- Genomic sequence data is type of ------b-----a) Graph data b) ordered data c) Record data d) Chemical Data 5- The Noise and outliers , missing values and duplicate data are problems of----d---a) Processing data b) Mapping Data c) Evaluating data d) Data quality 6- ---------b-------- refers to modification of original values. a) Missing Values b) Noise c) Transform d) Selection 7- -------a---------is a combining two or more attributes into a single attribute . 8- -------b------- is the main technique employed for data selection . a)Aggregation b) Sampling c) Duplicate d) Transform 9- Reduce amount of time and memory required by data mining algorithms and it may help to eliminate irrelevant features or reduce noise. a) Transaction Data b) Handling missing c) Dimensionality Reduction d) Data processing 1 10Cluster is (a) a) Group of similar objects that differ significantly from other objects b) Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm c) Symbolic representation of facts or ideas from which information can potentially be extracted d) None of these 11- Classification is ----------a-------- 12Classification task referred to ------c-------a) A subdivision of a set of examples into a number of classes b) A measure of the accuracy, of the classification of a concept that is given by a certain theory c) The task of assigning a classification to a set of examples d) None of these 13Euclidean distance measure is a) A stage of the Knowledge Discovery process in which new data is added to the existing selection. b) The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them. c) The distance between two points as calculated. d) None of these 14Dimensionality Reduction Techniques such as a) Principle Component Analysis b) Sampling c) Aggregation 15Mapping Data to a New Space (Frequency domain) a) Fourier transform b) Wavelet transform c) Entropy 16- Numerical measure of how alike two data objects are a) Irrelevant b) Dissimilarity c) Similarity d)All d) a and b d) All 17Create new attributes that can capture the important information in a data set much more efficiently than the original attributes a) Feature Creation b) Feature Selection c) PCA d) All 2 18Simplest approach is to divide region into a number of rectangular cells of equal volume and define density as # of points the cell contains a) Euclidean Distance b) Probability density c) Euclidean density d) None 19Minkowski Distance is a generalization of Euclidean Distance and it is equivalent to Euclidean Distance when r is equal to a) 0 b) 1 c) 2 d) infinity 20Similarity Between Binary Vectors can be measured by a) Minkowski Distance b) Jaccard Coefficients c)Euclidean Distance d)All /////////////////////////////////////////////////////////////////////////////////////////////////////// Q2: A) What are types of Data sets and data quality problems? 1. Graph data 2. Ordered data 3. Record data B) Complete the following table that represents Similarity and dissimilarity for simple attributes? 3 Q3) a) Compute the Euclidean Distance between each two points? Euclidean Distance :- dist point p1 p2 p3 p4 n 2 ( p q ) k k k 1 dist( p1,p2) = ( ( P1x -P2x )2 + ( P1y - P2y )2 )0.5 = ( ( 0 -2 )2 + ( 2 - 0 )2 )0.5 = ( 4 + 4 )0.5= ( 8 )0.5= 2.83 And so on……. B)Compute the Similarity Between Binary Vectors p, q: p= 1000000000 and q = 0 0 0 0 0 0 1 0 0 1 Answer p= 1 0 0 0 0 0 0 0 0 0 q= 0 0 0 0 0 0 1 0 0 1 M01 = 2 (the number of attributes where p was 0 and q was 1) M10 = 1 (the number of attributes where p was 1 and q was 0) M00 = 7 (the number of attributes where p was 0 and q was 0) M11 = 0 (the number of attributes where p was 1 and q was 1) SMC = (M11 + M00)/(M01 + M10 + M11 + M00) = (0+7) / (2+1+0+7) = 0.7 J = (M11) / (M01 + M10 + M11) = 0 / (2 + 1 + 0) = 0 4 x 0 2 3 5 y 2 0 1 1