Uploaded by rupindersinghguru

dwdm mcq qns 2020

advertisement
DWDM MCQ questions
Q1. Which of the following methods do we use to find the best fit line for data in Linear Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans=(a)
Q2.______ refers loosely to the process of semi-automatically analyzing large databases to find useful
patterns.
(a)
(b)
(c)
(d)
Data mining
Data warehousing
DBMS
Data mirroring
Ans= (a)
Q3.Consider the following two statements
S1: Data scrubbing is a process to upgrade the quality of data, before it is moved into data warehouse.
S2: Data scrubbing is a process of rejecting data from data warehouse to create indexes.
Which one of the following options is correct ?
(a)S1 true, S2 false (b)S1 false, S2 true (c) both S1 and S2 false (d)both S1 and S2 true
Ans= (a)
Q4. The most common source of change data in refreshing a data warehouse is:
(a) Queryable change data
(b) Cooperative change data
(c) Logged change data
(d) Snapshot change data
Ans: (d)
Q5. Data transformation includes which of the following?
(a) A process to change data from a detailed level to a summary level
(b) A process to change data from a summary level to a detailed level
(c) Joining data from one source into various sources of data
(d) Separating data from one source into various sources of data
Ans: A
Q6. Data warehouse contains ……………. data that is never found in the operational environment.
A) normalized
B) informational
C) summary
D) denormalized
Ans= (c)
Q7. The extract process is which of the following?
(a) Capturing all of the data contained in various operational systems
(b) Capturing a subset of the data contained in various operational systems
(c) Capturing all of the data contained in various decision support systems
(d) Capturing a subset of the data contained in various decision support systems
Ans: B
Q8. An operational system is which of the following?
(a) A system that is used to run the business in real time and is based on historical data.
(b) A system that is used to run the business in real time and is based on current data.
(c) A system that is used to support decision making and is based on current data.
(d) A system that is used to support decision making and is based on historical data.
Ans: B
Q9. Which of the following is not a kind of data warehouse application?
A) Information processing
B) Analytical processing
C) Data mining
D) Transaction processing
And= (d)
Q10. …………………….. supports basic OLAP operations, including slice and dice, drill-down, rollup and pivoting.
A) Information processing
B) Analytical processing
C) Data mining
D) Transaction processing
Ans= (b)
Q11. The data from the operational environment enter …………………… of data warehouse.
A) Current detail data
B) Older detail data
C) Lightly Summarized data
D) Highly summarized data
Ans= (a)
Q12. Which of the following statement is true?
(a) The data warehouse consists of data marts and operational data
(b) The data warehouse is used as a source for the operational data
(c) The operational data are used as a source for the data warehouse
(d) All of the above
Ans: (c)
Q13...........system is market oriented and is used for data analysis by knowledge workers including
Managers, Executives and Analysts.
(a) OLTP
(b) OLAP (c) Data system (d) Market system
Ans= (b)
Q14. A data cube C, has n dimensions and each dimensions has exactly p distinct values in the base
cuboid. Assume that there are no concept hierarchies associated with the dimensions. What is the
maximum number of cells possible in the data cube, C ?
(a)p^n
(b) p
(c) (2^n -1)p + 1
(d) (p+1)^n
Ans= (d)
Q15. Which of the following features usually applies to data in a data warehouse?
(a) Data are often deleted
(b) Most applications consist of transactions
(c) Data are rarely deleted
(d) Relatively few records are processed by applications
Ans: (c)
Q16. What is the relation between candidate and frequent itemsets?
(a) A candidate itemset is always a frequent itemset
(b) A frequent itemset must be a candidate itemset
(c) No relation between the two
(d) Both are same
Ans: b
Q17. Which technique finds the frequent itemsets in just two database scans?
(a) Partitioning
(b) Sampling
(c) Hashing
(d) Dynamic itemset counting
Ans: (a)
Q18. What is the principle on which Apriori algorithm work?
(a) If a rule is infrequent, its specialized rules are also infrequent
(b) If a rule is infrequent, its generalized rules are also infrequent
(c) Both a and b
(d) None of the above
Ans: (a)
Q19. Why is correlation analysis important?
(a) To make apriori memory efficient
(b) To weed out uninteresting frequent itemsets
(c) To find large number of interesting itemsets
(d) To restrict the number of database iterations
Ans: (b)
Q20. The..............step eliminates the extensions of (k-1)-itemsets which are not found to be frequent
from being considered for counting support
(a)partitioning
(b)candidate generation
(c) Itemset eliminations
(d) Pruning
Ans= (d)
Q21. Hierarchical agglomerative clustering is typically visualized as
(a) Dendogram
(b)Binary trees (c) Block Diagram (d) Graph
Ans= (a)
Q22. Factless fact table in a data warehouse contains
(a)only measures
(b)only dimensions
(c)keys and measures
(d)only surrogate keys
Ans= (b)
Q23. In a rule based classifier, if there is a rule for each combination of attribute values, what do you
call that rule set R
(a) Exhaustive
(b) Inclusive
(c) Comprehensive
(d) Mutually exclusive
Ans= (a)
Q24. If two variables V1 and V2, are used for clustering. Which of the following are true for K means
clustering with k =3?
1. If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line
2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line
(a) 1 only
(b) 2 only
(c) 1 and 2
(d) None of the above
Ans=(a)
Q25. Repository of information gathered from multiple sources , storing under unified scheme at a
single site is known as
(a)Data mining
(b) metadata
(c) data warehousing
(d) database
Ans= (c)
Q26. Which of the following clustering algorithms suffers from the problem of convergence at local
optima?
1. K- Means clustering algorithm
2. Agglomerative clustering algorithm
3. Expectation-Maximization clustering algorithm
4. Diverse clustering algorithm
(a) 1 &3
(b) 2 & 3
(c) 1,2 & 4
(d) all of above
Ans=(d)
Q27. Feature scaling is an important step before applying K-Mean algorithm. What is reason behind
this?
(a) In distance calculation it will give the same weights for all features
(b) You always get the same clusters. If you use or don’t use feature scaling
(c) In Manhattan distance it is an important step but in Euclidian it is not
(d) None of these
Ans=(a)
Q28.Which of the following are true?
1. Clustering analysis is negatively affected by multicollinearity of features
2. Clustering analysis is negatively affected by heteroscedasticity
(a) 1 only
(b) 2 only
(c) 1 and 2 (d) None of them
Ans=(a)
Q29.Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Ans=(d)
Q30. When you find noise in data which of the following option would you consider in k-NN?
A) increasing the value of k
B) decreasing the value of k
C) Noise cannot be dependent on value of k
D) None of these
Ans=(a)
Download