Uploaded by manishjagtap1026

e951e652-666c-4670-9bb2-152861f9dee7 (5)

advertisement
Shivaji University, Kolhapur
Question Bank for Mar 2022 ( Summer ) Examination
Subject code: 80359
Subject Name: Data Warehousing and Data Mining
Common subject code(if any) _________________________________
1) Detail data in single fact table is otherwise known as______
a) monatomic data
b) diatomic data
c) atomic data
d) None of these
2) A data warehouse is _______
a) updated by end users
b) organized around important subject areas
c) contains numerous naming conventions and formats
d) contains only current data
3) ______ is data about data.
a) Metadata
b) Micro data
c) Mini data
d) Multi data
4) The father of data warehouse is ________
a) Bill Warner
b) Bill Inmon
c) Bill Gauss
d) Bill Gate
5) OLTP stands for ______
a) Online Transaction Processing
b) Online Transport Processing
c) Online Transaction Protocol
d) none of these
6) How many components are there in a data warehouse?
a) two
b) three
c) four
d) five
7) ______ is/are OLAP operations.
a) Drill down
b) Dice
c) Slice
d) All of these
8) _______ are applications of data warehouse.
a) Financial services
b) Consumer goods
c) Controlled manufacturing
d) All of these
9) The important aspect of the data warehouse environment is that data found within the data
warehouse is_______.
a) subject-oriented
b) time-variant
c) integrated
d) All of these
10) Fact tables are ___________.
a) completely demoralized
b) partially demoralized
c) completely normalized
d) partially normalized
11) KDD stands for_______
a) Knowledge Discovery from Data
b) Knowledge Details in Data
c) Know Discovered Data
d) None of these
12) Data cleaning includes _______
a) removing irrelevant observations
b) handle missing data
c) filter unwanted outliers
d) All of these
13) Data integration is the process of ________
a) collect data from single resource
b) combining data from different resources
c) combining data from different forms
d) None of these
14) Extreme values that occur infrequently are called as ______
a) outliers
b) rare values
c) dimensionality reduction
d) None of these
15) Data that are not of interest to the data mining task is called as ______data
a) missing
b) irrelevant
c) changing
d) noisy
16) Converting data from different sources into a common format for processing is called as
________.
a) preprocessing
b) transformation
c) selection
d) interpretation
17) _______ refers to how often a given rule appears in the database being mined.
a) Confidence
b) Support
c) Count
d) None of these
18) The first phase of A Priori algorithm is _______.
a) Candidate generation.
b) Itemset generation.
c) Pruning.
d) Partitioning.
19) Apriori algorithm is given by R. Agarwal and R. Srikant in ______ for finding frequent
itemsets in a dataset.
a) 1990
b) 1991
c) 1992
d) 1994
20) ______ is an example of association analysis.
a) Market Basket Analysis
b) Mixed Basket Analysis
c) Modern Basket Analysis
d) None of these
21) Classification rules are extracted from ____________
a) Root node
b) decision tree
c) branches
d) None of these
22) SVM stands for __________
a) Support Vector Marking
b) Support Vector Machine
c) Supplementary Virtual Machine
d) None of these
23) Gini index term is related with ___________ classifier.
a) Naïve Bayes
b) SVM
c) Decision tree
d) None of these
24) A decision tree is a structure that includes a _____
a) root node
b) branches
c) leaf nodes
d) All of these
25) In k-nearest neighbor algorithm k stands for________
a) number of neighbors that are investigated
b) number of iterations
c) number of total records
d) None of these
26) The _______are present on hyper plane.
a) support vectors
b) support points
c) support lines
d) None of these
27) ______ algorithms generate if-then rule to perform the classification.
a) Rule based classification
b) Condition based classification
c) Value based classification
d) None of these
28) _____ simply store the training data and wait until a testing data appear.
a) smart learner
b) lazy learner
c) active learner
d) passive learner
29) Rule based classification algorithms generate ______ rule to perform the classification.
a) if-then
b) while
c) do-while
d) switch
30) ______ algorithm is used to build decision tree classifier in a given dataset of training
instances.
a) Greedy
b) Bayes
c) ETL
d) None of these
31) K-means algorithm is used for __________
a) clustering
b) prediction
c) regression
d) None of these
32) Cluster is _____
a) group of similar objects that differ significantly from other objects
b) operations on a database to transform data for machine-learning algorithm
c) symbolic representation of ideas from which information can be extracted
d) none of these
33) Types of clustering are
a) centroid-based
b) density-based
c) hierarchical
d) All of these
________
34) The goal of _____ is to discover both the dense and sparse regions of a data set.
a) Classification
b) Clustering
c) Association rule
d) Genetic Algorithm
35) DBSCAN stands for ______
a) Density-Based Spatial Classification of Applications with Noise
b) Density-Based Spatial Clustering of Applied Name
c) Density-Based Spatial Clustering of Applications with Noise
d) None of these
36) In ________ algorithm each cluster is represented by the center of gravity of the cluster.
a) k-medoid
b) k-means
c) STIRR
d) ROCK
37) ______ is the clustering technique which needs the merging approach.
a) Naive Bayes
b) Hierarchical
c) Partitioned
d) None of these
38) _____ clustering technique start with as many clusters as there are records, with each cluster
having only one record.
a) Agglomerative
b) Divisive
c) Partition
d) Numeric
39) In web mining, _______ is used to find natural groupings of users, pages, etc.
a) clustering.
b) associations.
c) sequential analysis.
d) classification.
40) Which of the following is used to examine data collected by search engines and web spiders?
a) Web content mining
b) Web usage mining
c) Web structure mining
d) None of these
1) Define data warehouse. Differentiate between ROLAP and HOLAP.
2) Compare and contrast operational database systems with data warehouse.
3) What is the importance of data marts in data warehouse?
4) With the help of examples explain various OLAP operations.
5) Give examples for defining star, snowflake and fact constellation schemas.
6) Discuss about a three-tier data warehouse architecture.
7) Describe the process of data cleaning.
8) Compare and contrast online transaction processing with online analytical processing.
9) With necessary diagram and examples of data cubes explain various OLAP operations.
10) Design fact constellation table with suitable example.
11) Differentiate ROLAP, MOLAP and HOLAP server functionalities.
12) “Data preprocessing is necessary before data mining process”. Justify your answer.
13) Write in brief about schemas in multidimensional data model.
14) What is data warehouse and what are its four key features?
15) What is metadata in data warehouse? What it contains?
16) With the help of diagram, explain all the steps in KDD process.
17) Define data mining. Enlist and explain various forms of data preprocessing.
18) What is meant by noise data? how to handle noisy data in real practice?
19) Enlist and explain the different applications of data mining.
20) Enlist and explain major issues in data mining.
21) Define data mining. Enlist and explain various challenges of data mining.
22) Enlist and explain data mining functionalities.
23) Define association analysis. Explain market-basket analysis technique.
24) With suitable example explain FP-growth algorithm.
25) Explain the concept of maximal frequent itemset with suitable example.
26) Explain the concept of closed frequent itemset with suitable example.
27) Consider the market basket transactions given in the following table.
Let minimum support is 2 and minimum confidence is 60%.
Find all the frequent item sets using Apriori algorithm.
Trans ID
T1
T2
T3
T4
T5
T6
T7
T8
T9
Items purchased
Pencil, Eraser, Paper
Eraser, Pen
Eraser, Sharpener
Pencil, Eraser, Pen
Pencil, Sharpener
Eraser, Sharpener
Pencil, Sharpener
Pencil, Eraser, Sharpener, Paper
Pencil, Eraser, Sharpener
28) A database has five transactions. Let min_sup = 60% and min_conf=80%
Trans ID
Items purchased
T100
{M,O,N,K,E,Y}
T200
{D,O,N,K,E,Y}
T300
{M,A,K,E}
T400
{M,U,C,K,Y}
T500
{C,O,O,K,I,E}
Find all frequent itemsets using Apriori algorithm.
29) A database has five transactions. Let min_sup = 30% and min_conf = 70%
Trans ID
1
2
3
4
5
6
7
Items purchased
{a,b,d,e}
{b,c,d}
{a,b,d,e}
{a,c,d,e}
{b,c,d,e}
{b,d,e}
{c,d}
8
{a,b,c}
9
{a,d,e}
10
{b,d}
Find all frequent itemsets using Apriori algorithm.
30) Consider the following transaction database:
Trans ID
Items purchased
1
A,B,C,D
2
A,B,C,D,E,G
3
A,C,G,H,K
4
B,C,D,E,K
5
D,E,F,H,L
6
A,B,C,D,L
7
A,D,F,L
8
B,I,E,K,L
9
A,B,D,E,K
10
C,D,H,I,K
11
A,E,F,H,L
12
B,C,D,F
13
A,B,C,D
14
A,D,H,K
15
B,C,D,E,H,L
Apply the Apriori algorithm with minimum support of 30% and minimum confidence of
75% and find all the association rules in the data set.
31) Differentiate between supervised and unsupervised learning with examples.
32) Enlist and discuss various issues regarding classification and prediction.
33) What is decision tree? Describe the major strengths and weaknesses of the decision tree
methods.
34) What is classification? Enlist and Explain any one of the classification technique of your
choice.
35) What is decision tree? Explain how classification is done using decision tree induction.
36) What are the concepts of linear and non-linear classification? Explain how data is classified
by using support vector machine.
37) What is decision tree? What is attribute selection measure?
38) Explain in detail Bayesian classifiers with example.
39) Explain Naïve Bayesian classification in detail with example.
40) What is classification? How Does Classification Works?
41) Explain the concept of rule based classification.
42) Explain how data is classified by using k-nearest neighbor classifier technique.
43) What is classification? Explain types of SVM. How does SVM works?
44) Explain how data is classified by using support vector machine.
45) Explain how SVM technique is used to classify the data even in high dimension? State its
advantages and disadvantages.
46) What is meant by clustering? Explain any one clustering algorithm of your choice.
47) Define clustering. Explain any one clustering technique in detail.
48) Define clustering. Enlist different types of clustering techniques. Explain hierarchical
clustering technique in detail.
49) What is clustering? Explain k-means clustering algorithm with example.
50) Differentiate between prototype based clustering and density based clustering
51) What is the main objective of the clustering? Give the categorization of clustering
approaches. Briefly discuss them.
52) What are key issues in hierarchical clustering? Explain.
53) Explain about the basic Agglomerative hierarchical clustering algorithm.
54) Discuss the similarity measures and distance measures frequently used in clustering the data.
55) Describe DBSCAN clustering techniques in detail.
56) How to cluster the data sets using k-mediod clustering algorithm?
57) Enlist and explain various applications of data mining.
58) Explain the concept of web mining. Explain the web content mining techniques in detail.
59) Explain the concept of web mining. Explain the web structure mining techniques in detail.
60) Explain the concept of web mining. Explain the web usage mining techniques in detail.
Download