Second Midterm 18-2-1437 1st semester 1436 / 1437H In Datamining Course College:…Science in az Zulfi……… Program: CSI Dept. Course Name: Datamining Course Code: CSI 449 Section: Date: 28-12-1436 Number of pages: 4 The student's name: University ID: 1 Q(1) Select the wright answer of the followings: Q1: Choose the right Answer? [ 10 marks] 1. Knowledge is referred to -----------------------------------------------a) Non-trivial extraction of implicit previously unknown and potentially useful information from data b) Set of columns in a database table that can be used to identify each record within this table uniquely c) Collection of interesting and useful patterns in a dataset d) None of theses 2. Which of the following is not a data mining functionality? a) Characterization and Discrimination b) Classification and regression c) Selection and interpretation d) Clustering and Analysis 3. a) b) c) d) Classification is------------------------------------------A subdivision of a set of examples into a number of classes A measure of the accuracy, of the classification of a concept that is given by a certain theory The task of assigning a classification to a set of examples None of these 4. a) b) c) d) Cluster is----------------------------------------------Symbolic representation of facts or ideas from which information can potentially be extracted Group of similar objects that differ significantly from other objects Operations on a database to transform or simplify data in order to prepare it for a machinelearning algorithm None of these 5. a) b) c) d) Data selection is--------------------------------------------The actual discovery phase of a knowledge discovery process The stage of selecting the right data for a KDD process A subject-oriented integrated time variant non-volatile collection of data in support of management None of these 6. Classification task referred to---------------------------------a) A subdivision of a set of examples into a number of classes 2 b) A measure of the accuracy, of the classification of a concept that is given by a certain theory c) The task of assigning a classification to a set of examples d) None of these 7. a) b) c) d) Euclidean distance measure is----------------------------A stage of the data process in which new data is added to the existing selection The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them The distance between two points as calculated using the || ||2 distance. The distance between two points as calculated using the || ||n distance 8. 9. a) b) c) d) Nationality, ethnicity, language, style, biological species, and zip codes are examples of-------------Mass , length, duration, plane angle, energy and electric charge are examples of---Nominal Ordinal Ratio Interval 10. Median, a) b) c) d) percentiles, and rank correlation are operations can be computed on Nominal data Ordinal data Ratio data Interval data 11. Extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms. 12. World a) b) c) d) e) Wide Web is data type of ----------------Graph data Ordered data Record data Web Data document 13. Noise a) b) c) d) refers to ------------------------------------------------modification of feature values modification of original values an error in original values a combining two or more attributes into a single attribute 3 14. --------------- are data objects with characteristics that are considerably different than most of the other data objects in the data set. a) Outliers data b) Missing Values c) Duplicate Data d) Data Preprocessing 15. Aggregation is ------------------------------------------------a) a combining two or more attributes into a single attribute . b) the main technique employed for data selection . c) the mean average of aggregated data 16. Dimensionality Reduction algorithm to -----------------------------------------a) Reduce amount of time and memory required by data mining algorithms and it may help to eliminate irrelevant features or reduce noise. b) Mapping Data to a New Space c) Create new attributes that can capture the important information in a data set much more efficiently than the original attributes. 17. Create new attributes that can capture the important information in a data set much more efficiently than the original attributes a) Feature Creation b) Feature Selection c) Feature Extraction d) All 18. Numerical measure of how alike two data objects are a) Irrelevant b) Dissimilarity c) Similarity d) All 19. Minkowski Distance is a generalization of Euclidean Distance and it is equivalent to Euclidean Distance when r is equal to a) 0 b) 1 c) 2 d) infinity 20. Similarity Between Binary Vectors can be measured by a) Minkowski Distance b) Jaccard Coefficients c)Euclidean Distance d) Mean 4 Q2: Complete the followings? [ 5 marks] A) What are data quality problems? 123B) What are Data Preprocessing ? 1234- C) Complete the following table that represents Similarity and dissimilarity for simple attributes? 5 Q3) Compute the Distance between the following data types? [ 5 marks] A) P1 ( 1,-2,3) and P2( 1, -1, 2 ) B) S1=( Fail , Good , Good , Better) and S2= ( Fail ,Good , Better , Good) Where : The range data type is { fail, pass, good , V. Good, better } C) p = 1 0 0 1 0 0 1 0 0 1 and q= 1100101001 6