Data Warehousing & Mining Question Bank

Shivaji University, Kolhapur Question Bank for Mar 2022 ( Summer ) Examination Subject code: 80359 Subject Name: Data Warehousing and Data Mining Common subject code(if any) _________________________________ 1) Detail data in single fact table is otherwise known as______ a) monatomic data b) diatomic data c) atomic data d) None of these 2) A data warehouse is _______ a) updated by end users b) organized around important subject areas c) contains numerous naming conventions and formats d) contains only current data 3) ______ is data about data. a) Metadata b) Micro data c) Mini data d) Multi data 4) The father of data warehouse is ________ a) Bill Warner b) Bill Inmon c) Bill Gauss d) Bill Gate 5) OLTP stands for ______ a) Online Transaction Processing b) Online Transport Processing c) Online Transaction Protocol d) none of these 6) How many components are there in a data warehouse? a) two b) three c) four d) five 7) ______ is/are OLAP operations. a) Drill down b) Dice c) Slice d) All of these 8) _______ are applications of data warehouse. a) Financial services b) Consumer goods c) Controlled manufacturing d) All of these 9) The important aspect of the data warehouse environment is that data found within the data warehouse is_______. a) subject-oriented b) time-variant c) integrated d) All of these 10) Fact tables are ___________. a) completely demoralized b) partially demoralized c) completely normalized d) partially normalized 11) KDD stands for_______ a) Knowledge Discovery from Data b) Knowledge Details in Data c) Know Discovered Data d) None of these 12) Data cleaning includes _______ a) removing irrelevant observations b) handle missing data c) filter unwanted outliers d) All of these 13) Data integration is the process of ________ a) collect data from single resource b) combining data from different resources c) combining data from different forms d) None of these 14) Extreme values that occur infrequently are called as ______ a) outliers b) rare values c) dimensionality reduction d) None of these 15) Data that are not of interest to the data mining task is called as ______data a) missing b) irrelevant c) changing d) noisy 16) Converting data from different sources into a common format for processing is called as ________. a) preprocessing b) transformation c) selection d) interpretation 17) _______ refers to how often a given rule appears in the database being mined. a) Confidence b) Support c) Count d) None of these 18) The first phase of A Priori algorithm is _______. a) Candidate generation. b) Itemset generation. c) Pruning. d) Partitioning. 19) Apriori algorithm is given by R. Agarwal and R. Srikant in ______ for finding frequent itemsets in a dataset. a) 1990 b) 1991 c) 1992 d) 1994 20) ______ is an example of association analysis. a) Market Basket Analysis b) Mixed Basket Analysis c) Modern Basket Analysis d) None of these 21) Classification rules are extracted from ____________ a) Root node b) decision tree c) branches d) None of these 22) SVM stands for __________ a) Support Vector Marking b) Support Vector Machine c) Supplementary Virtual Machine d) None of these 23) Gini index term is related with ___________ classifier. a) Naïve Bayes b) SVM c) Decision tree d) None of these 24) A decision tree is a structure that includes a _____ a) root node b) branches c) leaf nodes d) All of these 25) In k-nearest neighbor algorithm k stands for________ a) number of neighbors that are investigated b) number of iterations c) number of total records d) None of these 26) The _______are present on hyper plane. a) support vectors b) support points c) support lines d) None of these 27) ______ algorithms generate if-then rule to perform the classification. a) Rule based classification b) Condition based classification c) Value based classification d) None of these 28) _____ simply store the training data and wait until a testing data appear. a) smart learner b) lazy learner c) active learner d) passive learner 29) Rule based classification algorithms generate ______ rule to perform the classification. a) if-then b) while c) do-while d) switch 30) ______ algorithm is used to build decision tree classifier in a given dataset of training instances. a) Greedy b) Bayes c) ETL d) None of these 31) K-means algorithm is used for __________ a) clustering b) prediction c) regression d) None of these 32) Cluster is _____ a) group of similar objects that differ significantly from other objects b) operations on a database to transform data for machine-learning algorithm c) symbolic representation of ideas from which information can be extracted d) none of these 33) Types of clustering are a) centroid-based b) density-based c) hierarchical d) All of these ________ 34) The goal of _____ is to discover both the dense and sparse regions of a data set. a) Classification b) Clustering c) Association rule d) Genetic Algorithm 35) DBSCAN stands for ______ a) Density-Based Spatial Classification of Applications with Noise b) Density-Based Spatial Clustering of Applied Name c) Density-Based Spatial Clustering of Applications with Noise d) None of these 36) In ________ algorithm each cluster is represented by the center of gravity of the cluster. a) k-medoid b) k-means c) STIRR d) ROCK 37) ______ is the clustering technique which needs the merging approach. a) Naive Bayes b) Hierarchical c) Partitioned d) None of these 38) _____ clustering technique start with as many clusters as there are records, with each cluster having only one record. a) Agglomerative b) Divisive c) Partition d) Numeric 39) In web mining, _______ is used to find natural groupings of users, pages, etc. a) clustering. b) associations. c) sequential analysis. d) classification. 40) Which of the following is used to examine data collected by search engines and web spiders? a) Web content mining b) Web usage mining c) Web structure mining d) None of these 1) Define data warehouse. Differentiate between ROLAP and HOLAP. 2) Compare and contrast operational database systems with data warehouse. 3) What is the importance of data marts in data warehouse? 4) With the help of examples explain various OLAP operations. 5) Give examples for defining star, snowflake and fact constellation schemas. 6) Discuss about a three-tier data warehouse architecture. 7) Describe the process of data cleaning. 8) Compare and contrast online transaction processing with online analytical processing. 9) With necessary diagram and examples of data cubes explain various OLAP operations. 10) Design fact constellation table with suitable example. 11) Differentiate ROLAP, MOLAP and HOLAP server functionalities. 12) “Data preprocessing is necessary before data mining process”. Justify your answer. 13) Write in brief about schemas in multidimensional data model. 14) What is data warehouse and what are its four key features? 15) What is metadata in data warehouse? What it contains? 16) With the help of diagram, explain all the steps in KDD process. 17) Define data mining. Enlist and explain various forms of data preprocessing. 18) What is meant by noise data? how to handle noisy data in real practice? 19) Enlist and explain the different applications of data mining. 20) Enlist and explain major issues in data mining. 21) Define data mining. Enlist and explain various challenges of data mining. 22) Enlist and explain data mining functionalities. 23) Define association analysis. Explain market-basket analysis technique. 24) With suitable example explain FP-growth algorithm. 25) Explain the concept of maximal frequent itemset with suitable example. 26) Explain the concept of closed frequent itemset with suitable example. 27) Consider the market basket transactions given in the following table. Let minimum support is 2 and minimum confidence is 60%. Find all the frequent item sets using Apriori algorithm. Trans ID T1 T2 T3 T4 T5 T6 T7 T8 T9 Items purchased Pencil, Eraser, Paper Eraser, Pen Eraser, Sharpener Pencil, Eraser, Pen Pencil, Sharpener Eraser, Sharpener Pencil, Sharpener Pencil, Eraser, Sharpener, Paper Pencil, Eraser, Sharpener 28) A database has five transactions. Let min_sup = 60% and min_conf=80% Trans ID Items purchased T100 {M,O,N,K,E,Y} T200 {D,O,N,K,E,Y} T300 {M,A,K,E} T400 {M,U,C,K,Y} T500 {C,O,O,K,I,E} Find all frequent itemsets using Apriori algorithm. 29) A database has five transactions. Let min_sup = 30% and min_conf = 70% Trans ID 1 2 3 4 5 6 7 Items purchased {a,b,d,e} {b,c,d} {a,b,d,e} {a,c,d,e} {b,c,d,e} {b,d,e} {c,d} 8 {a,b,c} 9 {a,d,e} 10 {b,d} Find all frequent itemsets using Apriori algorithm. 30) Consider the following transaction database: Trans ID Items purchased 1 A,B,C,D 2 A,B,C,D,E,G 3 A,C,G,H,K 4 B,C,D,E,K 5 D,E,F,H,L 6 A,B,C,D,L 7 A,D,F,L 8 B,I,E,K,L 9 A,B,D,E,K 10 C,D,H,I,K 11 A,E,F,H,L 12 B,C,D,F 13 A,B,C,D 14 A,D,H,K 15 B,C,D,E,H,L Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 75% and find all the association rules in the data set. 31) Differentiate between supervised and unsupervised learning with examples. 32) Enlist and discuss various issues regarding classification and prediction. 33) What is decision tree? Describe the major strengths and weaknesses of the decision tree methods. 34) What is classification? Enlist and Explain any one of the classification technique of your choice. 35) What is decision tree? Explain how classification is done using decision tree induction. 36) What are the concepts of linear and non-linear classification? Explain how data is classified by using support vector machine. 37) What is decision tree? What is attribute selection measure? 38) Explain in detail Bayesian classifiers with example. 39) Explain Naïve Bayesian classification in detail with example. 40) What is classification? How Does Classification Works? 41) Explain the concept of rule based classification. 42) Explain how data is classified by using k-nearest neighbor classifier technique. 43) What is classification? Explain types of SVM. How does SVM works? 44) Explain how data is classified by using support vector machine. 45) Explain how SVM technique is used to classify the data even in high dimension? State its advantages and disadvantages. 46) What is meant by clustering? Explain any one clustering algorithm of your choice. 47) Define clustering. Explain any one clustering technique in detail. 48) Define clustering. Enlist different types of clustering techniques. Explain hierarchical clustering technique in detail. 49) What is clustering? Explain k-means clustering algorithm with example. 50) Differentiate between prototype based clustering and density based clustering 51) What is the main objective of the clustering? Give the categorization of clustering approaches. Briefly discuss them. 52) What are key issues in hierarchical clustering? Explain. 53) Explain about the basic Agglomerative hierarchical clustering algorithm. 54) Discuss the similarity measures and distance measures frequently used in clustering the data. 55) Describe DBSCAN clustering techniques in detail. 56) How to cluster the data sets using k-mediod clustering algorithm? 57) Enlist and explain various applications of data mining. 58) Explain the concept of web mining. Explain the web content mining techniques in detail. 59) Explain the concept of web mining. Explain the web structure mining techniques in detail. 60) Explain the concept of web mining. Explain the web usage mining techniques in detail.

Data Warehousing & Mining Question Bank

Related documents

Products

Support

Data Warehousing & Mining Question Bank

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib