DWDM MCQ questions Q1. Which of the following methods do we use to find the best fit line for data in Linear Regression? A) Least Square Error B) Maximum Likelihood C) Logarithmic Loss D) Both A and B Ans=(a) Q2.______ refers loosely to the process of semi-automatically analyzing large databases to find useful patterns. (a) (b) (c) (d) Data mining Data warehousing DBMS Data mirroring Ans= (a) Q3.Consider the following two statements S1: Data scrubbing is a process to upgrade the quality of data, before it is moved into data warehouse. S2: Data scrubbing is a process of rejecting data from data warehouse to create indexes. Which one of the following options is correct ? (a)S1 true, S2 false (b)S1 false, S2 true (c) both S1 and S2 false (d)both S1 and S2 true Ans= (a) Q4. The most common source of change data in refreshing a data warehouse is: (a) Queryable change data (b) Cooperative change data (c) Logged change data (d) Snapshot change data Ans: (d) Q5. Data transformation includes which of the following? (a) A process to change data from a detailed level to a summary level (b) A process to change data from a summary level to a detailed level (c) Joining data from one source into various sources of data (d) Separating data from one source into various sources of data Ans: A Q6. Data warehouse contains ……………. data that is never found in the operational environment. A) normalized B) informational C) summary D) denormalized Ans= (c) Q7. The extract process is which of the following? (a) Capturing all of the data contained in various operational systems (b) Capturing a subset of the data contained in various operational systems (c) Capturing all of the data contained in various decision support systems (d) Capturing a subset of the data contained in various decision support systems Ans: B Q8. An operational system is which of the following? (a) A system that is used to run the business in real time and is based on historical data. (b) A system that is used to run the business in real time and is based on current data. (c) A system that is used to support decision making and is based on current data. (d) A system that is used to support decision making and is based on historical data. Ans: B Q9. Which of the following is not a kind of data warehouse application? A) Information processing B) Analytical processing C) Data mining D) Transaction processing And= (d) Q10. …………………….. supports basic OLAP operations, including slice and dice, drill-down, rollup and pivoting. A) Information processing B) Analytical processing C) Data mining D) Transaction processing Ans= (b) Q11. The data from the operational environment enter …………………… of data warehouse. A) Current detail data B) Older detail data C) Lightly Summarized data D) Highly summarized data Ans= (a) Q12. Which of the following statement is true? (a) The data warehouse consists of data marts and operational data (b) The data warehouse is used as a source for the operational data (c) The operational data are used as a source for the data warehouse (d) All of the above Ans: (c) Q13...........system is market oriented and is used for data analysis by knowledge workers including Managers, Executives and Analysts. (a) OLTP (b) OLAP (c) Data system (d) Market system Ans= (b) Q14. A data cube C, has n dimensions and each dimensions has exactly p distinct values in the base cuboid. Assume that there are no concept hierarchies associated with the dimensions. What is the maximum number of cells possible in the data cube, C ? (a)p^n (b) p (c) (2^n -1)p + 1 (d) (p+1)^n Ans= (d) Q15. Which of the following features usually applies to data in a data warehouse? (a) Data are often deleted (b) Most applications consist of transactions (c) Data are rarely deleted (d) Relatively few records are processed by applications Ans: (c) Q16. What is the relation between candidate and frequent itemsets? (a) A candidate itemset is always a frequent itemset (b) A frequent itemset must be a candidate itemset (c) No relation between the two (d) Both are same Ans: b Q17. Which technique finds the frequent itemsets in just two database scans? (a) Partitioning (b) Sampling (c) Hashing (d) Dynamic itemset counting Ans: (a) Q18. What is the principle on which Apriori algorithm work? (a) If a rule is infrequent, its specialized rules are also infrequent (b) If a rule is infrequent, its generalized rules are also infrequent (c) Both a and b (d) None of the above Ans: (a) Q19. Why is correlation analysis important? (a) To make apriori memory efficient (b) To weed out uninteresting frequent itemsets (c) To find large number of interesting itemsets (d) To restrict the number of database iterations Ans: (b) Q20. The..............step eliminates the extensions of (k-1)-itemsets which are not found to be frequent from being considered for counting support (a)partitioning (b)candidate generation (c) Itemset eliminations (d) Pruning Ans= (d) Q21. Hierarchical agglomerative clustering is typically visualized as (a) Dendogram (b)Binary trees (c) Block Diagram (d) Graph Ans= (a) Q22. Factless fact table in a data warehouse contains (a)only measures (b)only dimensions (c)keys and measures (d)only surrogate keys Ans= (b) Q23. In a rule based classifier, if there is a rule for each combination of attribute values, what do you call that rule set R (a) Exhaustive (b) Inclusive (c) Comprehensive (d) Mutually exclusive Ans= (a) Q24. If two variables V1 and V2, are used for clustering. Which of the following are true for K means clustering with k =3? 1. If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line 2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line (a) 1 only (b) 2 only (c) 1 and 2 (d) None of the above Ans=(a) Q25. Repository of information gathered from multiple sources , storing under unified scheme at a single site is known as (a)Data mining (b) metadata (c) data warehousing (d) database Ans= (c) Q26. Which of the following clustering algorithms suffers from the problem of convergence at local optima? 1. K- Means clustering algorithm 2. Agglomerative clustering algorithm 3. Expectation-Maximization clustering algorithm 4. Diverse clustering algorithm (a) 1 &3 (b) 2 & 3 (c) 1,2 & 4 (d) all of above Ans=(d) Q27. Feature scaling is an important step before applying K-Mean algorithm. What is reason behind this? (a) In distance calculation it will give the same weights for all features (b) You always get the same clusters. If you use or don’t use feature scaling (c) In Manhattan distance it is an important step but in Euclidian it is not (d) None of these Ans=(a) Q28.Which of the following are true? 1. Clustering analysis is negatively affected by multicollinearity of features 2. Clustering analysis is negatively affected by heteroscedasticity (a) 1 only (b) 2 only (c) 1 and 2 (d) None of them Ans=(a) Q29.Which of the following evaluation metrics can be used to evaluate a model while modeling a continuous output variable? A) AUC-ROC B) Accuracy C) Logloss D) Mean-Squared-Error Ans=(d) Q30. When you find noise in data which of the following option would you consider in k-NN? A) increasing the value of k B) decreasing the value of k C) Noise cannot be dependent on value of k D) None of these Ans=(a)