Q# 1. Questions i Data mining is a) The actual discovery phase of a knowledge discovery process b) None of these c) The stage of selecting the right data for a KDD process d) A subject-oriented integrated time variant non-volatile collection of data in support of management ii Cluster is a) Group of similar objects that differ significantly from other objects b) Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm c) Symbolic representation of facts or ideas from which information can potentially be extracted d) None of these iii OLAP stands for a) Online analytical processing b) Online analysis processing c) Online transaction processing d) Online aggregate processing iv To integrate heterogeneous databases, how many approaches are there in Data Warehousing? a) 2 b) 3 c) 4 d) 5 v __________ is a system where operations like data extraction, transformation and loading operations are executed. a) Data staging b) Data integration c) ETL d) None of the mentioned vi What is the use of data cleaning? a) to remove the noisy data b) correct the inconsistencies in data c) transformations to correct the wrong data d) All of the above vii Data Mining System Classification consists of? a) Database Technology b) Machine Learning c) Information Science d) All of the above viii Data selection is a) The actual discovery phase of a knowledge discovery process b) The stage of selecting the right data for a KDD process c) A subject-oriented integrated time variant non-volatile collection of data in support of management d) None of these ix Background knowledge referred to a) Additional acquaintance used by a learning algorithm to facilitate the learning process b) A neural network that makes use of a hidden layer c) It is a form of automatic learning d) None of these x Data that can be modeled as dimension attributes and measure attributes are called _______ data. a) Multi dimensional b) Single dimensional c) Measured d) Dimensional Q# 1. Questions i Can FP growth algorithm be used if FP tree cannot be fit in memory? a. Yes b.No ii What are maximal frequent itemsets? a. A frequent itemsetwhose no super-itemset is frequent b. A frequent itemset whose super-itemset is also frequent c. A non-frequent itemset whose super-itemset is frequent d. None of the above. iii What are Max_confidence, Cosine similarity, All_confidence? a. Frequent pattern mining algorithms b. Measures to improve efficiency of apriori c. Pattern evaluation measure d. None of the above iv Which technique finds the frequent itemsets in just two database scans? a. Which technique finds the frequent itemsets in just two database scans? b. Sampling c. Hashing d. Dynamic itemset counting v How do you calculate Confidence(A -> B)? a. Support(A B) / Support (A) b. Support(A B) / Support (B) c. Support(A B) / Support (A) d. Support(A B) / Support (B) vi How many cells does an iceberg cube have if each dimension has exactly two distinct values and only base cuboid does not satisfy iceberg condition? a. 2n b. 3n c. 3n-2n d. 3n-1 vii Which of the following algorithm comes under the classification Select one : a. Apriori b. Brute force c. DBSCAN d. K-nearest neighbor viii ______consists of formal definitions, such as a COBOL layout or a database schema. a. Classical metadata. b. Transformation metadata. c. Historical metadata. d. Structural metadata. ix Detail data in single fact table is otherwise known as__________. a. monoatomic data. b. diatomic data. c. atomic data. d. multiatomic data. x Data set {brown, black, blue, green , red} is example of Select one: a. Continuous attribute b. Ordinal attribute c. Numeric attribute d. nominal attribute Q# 1. Questions i Agglomerative clustering uses a a. bottom-up approach b. top down approach c. both d. none ii Divisive clustering uses a a. None b. top down approach c. bottom-up approach d. both iii Which of the following statements is true for k-NN classifiers? a. The classification accuracy is better with larger values of kPattern evaluation measure b. The decision boundary is smoother with smaller values of k c. The decision boundary is linear d. k-NN does not require an explicit training step iv To detect fraudulent usage of credit cards, the following data mining task should be used a. H outlier analysis b. Prediction c. association analysis d. feature selection v Data scrubbing can be defined as a. Check field overloading b. Delete redundant tuples c. Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make corrections d. Analyzing data to discover rules and relationship to detect violators vi Which of the following is also referred to as overlayed 1D plot? a. lattice b. Barplot c. Gplot d. all of the mentioned vii _______ are numeric measurements or values that represent a specific business aspect or activity a. Dimensions b. Schemas c. FACTS d. TABLES viii A fact is said to be fully additive if __________. a. additive over atleast one of the dimensions b. Only numeric measures are used c. All possible summaries are stored d. it is additive over every dimension of its dimensionality ix Background knowledge referred to a) Additional acquaintance used by a learning algorithm to facilitate the learning process b) A neural network that makes use of a hidden layer c) It is a form of automatic learning d) None of these x Data that can be modeled as dimension attributes and measure attributes are called _______ data. a) Multi dimensional b) Single dimensional c) Measured d) Dimensional