Majmaah University College of Science in Zolfi تجسير علوم حاسب Data Mining Model Answer of Home Work(1) Dept. of Computer Science &Info. 6-1-1437 1- Why Data mining (Scientific view)? Huge of data collected and stored at enormous speeds (GB/hour) 2- Data mining may help scientists in: – in classifying and segmenting data – in Hypothesis Formation 1. -------b------ is a non-trivial extraction of implicit, previously unknown and potentially useful information from data. a)Data warehousing 2. 3. 4. 5. b) Data mining C) Text mining d) Data selection ..........b......... is an essential process where intelligent methods are applied to extract data patterns that is also referred to Knowledge discovery in database . a) Data warehousing b) Data mining C) Text mining d) Data selection Two fundamental goals of Data Mining are ____c____. a) Analysis and Description b) Data cleaning and organizing the data c) Prediction and Description d) Data cleaning and organizing the data ...........b............... is the process of finding a model that describes and distinguishes data classes or concepts. a) Data Characterization b) Data Classification c) Data clustering d) Data selection Cluster is--------------a----------------------------------------------------a) Group of similar objects that differ significantly from other objects b) Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm c) Symbolic representation of facts or ideas from which information can potentially be extracted d) None of these 1 6. In the clustering algorithm the distance between cluster centroid to each object is calculated using _______a________ method. a) Euclidean distance b) Clustering distance c)Central distance d) Cluster Classification task referred to -----a---------a) A subdivision of a set of examples into a number of classes b) A measure of the accuracy, of the classification of a concept that is given by a certain theory c) The task of assigning a classification to a set of examples d) None of these 7. 3a) b) c) d) What is Data Mining and not Data Mining? Look up phone number in phone directory ( Not Data Mining ) Query a Web search engine for information about “Amazon ( Not Data Mining Certain names are more prevalent in certain US locations( Data Mining ) Group together similar documents returned by search engine according to their context ( Data Mining ) 4- Draws ideas from Machine learning/AI, Pattern recognition, Statistics, and Data mining ? Statistics/ AI Data Mining 5- What are Data Mining Tasks? A) Prediction Methods : Classification Regression Deviation Detection B) Description Methods: Clustering Association Rule Discovery Sequential Pattern Discovery ) 2 Machine learning and Pattern recognition ) C) What are main steps to extract knowledge/ information from data ? 123456- Data ( input problem) Selection ( selected data ) Preprocessing ( preprocessed data) Transformation (transformed data) Data mining ( Pattern) Interpretation /Evaluation ( Knowledge) D) This figure is model of ----------Classification-------------------Training Set Learn Classifier Test Set Model – This Application on ( - the image (star or galaxy ) Segment------ image( star/ galaxy)---------------- Measure -------- image attributes (features)--------------. Model the -- prediction class (star or galaxy)------------- Success Story------- Could find some new- star or galaxy --3 E) Define Data Clustering ? Cluster is a Group of similar objects that differ significantly from other objects F) Define Regression Regression is predict a value of a given variable based on the values of other variables assuming a linear or nonlinear model of dependency. G) What are Similarity Measures of Clustering? Similarity Measures of Clustering is Euclidean Distance H) 1. 2. 3. 4. 5. 6. 7. What are Challenges of Data Mining ? Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation Streaming Data 4