UNIT4-QB

advertisement

CS2032 DATA WAREHOUSING AND DATA MINING

SURYA GROUP OF INSTITUTIONS

SCHOOL OF ENGINEERING & TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE&ENGINEERING

ACADEMIC YEAR 2011-2012 / ODD SEMESTER

SUBJECT CODE\SUBJECT NAME: CS2032 \ DATA WAREHOUSING AND DATA MINING

YEAR/SEM: IV/VII

UNIT-4-ASSOCIATION RULE MINING AND CLASSIFICATION

PART A (2 MARKS)

1.

Explain the categories of frequency pattern mining.

2.

Define i) Frequent itemset ii) Association rule

3.

What is meant by frequency itemset.

4.

What is the frequent itemset property?

5.

List the two interesting measures of an Association rule.

6.

Write the two measures of association rule.

7.

What are the interestingness measures of association rule mining?

8.

Define support and confidence.

9.

What are the Apriori properties used in the apriori algorithms?

10.

What are the means to improve the performance of association rule mining algorithms?

11.

Define conditional pattern base.

12.

Write the use of conditional pattern base in FP-Tree.

13.

Define item merging.

14.

Define sub-itemset pruning.

15.

Define item skipping.

16.

State how multilevel association rules can be mined.

17.

Define multilevel association rule.

18.

Define Lift.

19.

What is constraint based association mining?

20.

Define antimonotonic property.

21.

Define monotonic property.

22.

Define succinct property.

23.

Define convertible property.

24.

Define inconvertible property.

25.

Distinguish between clustering and classification.

26.

How is prediction different from classification?

27.

What is the difference between classification and prediction?

28.

What is supervised learning? Give example.

29.

What is unsupervised learning? Give example.

30.

Define relevance analysis.

31.

State the advantages of Decision tree approach over other approaches for performing classification?

32.

Define Decision tree.

CS2032 DATA WAREHOUSING AND DATA MINING

33.

List the attribute selection measures used in decision tree classification.

34.

Justify the need for pruning phase in decision tree construction.

35.

In classification trees, what are surrogate splits(split criterion) and how are they used?

36.

List out the major strength of decision tree method.

37.

Why naïve Bayesian classification is called naïve?

38.

The naïve Bayesian classifier makes what assumption that motivates its name?

39.

Define rule accuracy and rule coverage.

40.

Give the rule quality measures.

41.

Define rule pruning.

42.

How does backpropagation work.

43.

Define hyperplane.

44.

Define support vectors.

45.

Define CBA.

46.

Define CMAR.

47.

Define lazy learners.

48.

Define Case-Based reasoning.

49.

Define fuzzy set approach with an example.

50.

What is linear regression?

51.

What is non-linear regression?

52.

Define log linear models.

53.

Define regression trees.

54.

What are the factors to be considered when comparing classification methods?

PART B

1.

Explain Apriori algorithm with an example. (16)

2.

i) Discuss each step in the Apriori algorithm for discovering frequent itemsets from

a database. (8) ii) Apply the Apriori algorithm for discovering frequent item sets from the following

data set :

Trans ID

101

102

103

104

105

106

107

108

109

110

Items purchased

Milk, Bread, Eggs

Milk, Juice

Juice, Butter

Milk, Bread, Eggs

Coffee, Eggs

Coffee

Coffee, Juice

Milk, Bread, Cookies, Eggs

Cookies, Butter

Milk, Bread

Use 0.4 for the minimum support value.

3.

A database has four transactions. Let min_sup=60% and min_conf=80%

CS2032 DATA WAREHOUSING AND DATA MINING

TID

T100

T200

T300

T400

DATE

10/15/99

10/15/99

10/19/99

10/22/99

ITEMS_BOUGHT

{K, A, D, B}

{D, A, C, E, B}

{C, A, B, E}

{B, A,D}

I.

Find all frequent itemsets using Apriori and FP-growth respectively.

Compare the efficiency of the two mining processes. (10)

II.

How might the efficiency of Apriori be improved? Discuss. (6)

4.

Given the following transactional database

1 C, B, H

2 B, F, S

3 A, F, G

4 C, B, H

5 B, F, G

6 B, E, O

I.

We want to mine all the frequent itemsets in the data using the Apriori algorithm. Assume the minimum support is 30%(you need to give the set of frequent itemsets in L

1

,L

2

…. Candidate itemsets in C

2

,C

3

….) (9)

II.

Find all the association rules that involve only B, C, H(in either left or right hand side of the rule). The minimum confidence is 70%. (7)

5.

Discuss Aprori algorithm with a suitable example and explain how its efficiency can be improved. (16)

6.

Write an algorithm for FP-Tree construction and explain how frequent itemsets are generated from FP-Tree. (16)

7.

Explain about frequent pattern growth algorithm. (16)

8.

Write the algorithm to discover frequent itemsets without candidate generation and explain it with an example. (8+8)

9.

Write and explain the algorithm for mining frequent item sets without candidate generation. Give relevant example.

10.

With relevant example explain the single dimensional Boolean association rule mining algorithm from transactional databases? (16)

CS2032 DATA WAREHOUSING AND DATA MINING

11.

Explain single dimensional Boolean association rules from transactional databases and multidimensional association rules from relational databases. Give suitable examples. (16)

12.

With an example discuss multilevel association rule. (16)

13.

Describe the multidimensional association rule, giving a suitable example. (16)

14.

List and discuss the steps for mining multi-level association rules from transactional databases. Give relevant example.

15.

Discuss the approaches for mining multi level association rules from the transactional databases. Give relevant example.

16.

Describe in detail about constraint based association mining.

17.

Briefly outline the major steps of decision tree classification. Why is tree pruning useful in decision tree induction? What is a drawback of using a separate set of tuples to evaluate pruning.

18.

Briefly outline the major steps of Decision tree classification. What are the advantages and disadvantages of decision tree over other classification techniques?

19.

What is Decision tree? Briefly explain the classification by decision tree induction.(10)

20.

Why is tree pruning useful in decision tree induction? Give example. Also state the drawback of using a separate set of samples to evaluate pruning?

21.

Briefly discuss the major steps involved in the induction of decision trees used in the

ID3 algorithm. (16)

22.

Explain the algorithm for constructing a decision tree from training samples.

23.

What is Decision tree? Briefly explain the classification by decision tree induction.

(10) & (8)

24.

Discuss the issues that have to be addressed to perform classification using decision trees. (12)

25.

Decision tree induction is a popular classification method. Taking one typical decision tree induction algorithm, briefly outline the method of decision tree classification. (16)

26.

Consider the following training dataset and the original decision tree induction algorithm(ID3). Risk is the class label attribute. The height values have been already discretized into distinct ranges. Calculate the information gain if height is choosen

CS2032 DATA WAREHOUSING AND DATA MINING as the test attribute. Draw the final decision tree without any pruning for the training dataset. Generate all the “IF-THEN” rules from the decision tree.

F

F

F

M

F

M

F

M

F

M

M

GENDER

F

M

F

F

HEIGHT

{1.5, 1.6}

{1.9, 2.0}

{1.8, 1.9}

{1.8, 1.9}

{1.6, 1.7}

{1.8, 1.9}

{1.5, 1.6}

{1.6, 1.7}

{2.0, ∞}

{2.0, ∞}

{1.7, 1.8}

{1.9, 2.0}

{1.8, 1.9}

{1.7, 1.8}

{1.7, 1.8}

RISK

Low

High

Medium

Medium

Low

Medium

Low

Low

High

High

Medium

Medium

Medium

Medium

Medium

27.

What are Bayesian classifiers? Explain. (6)

28.

Why naïve Bayesian classification is called “naïve”? Briefly outline the major ideas of naïve Bayesian classification.

29.

State Baye’s theorem and discuss how Bayesian classification work. (16)

30.

Explain Bayes theorem.

31.

With a relevant example discuss how the naïve Bayesian classifier works. (12)

32.

Describe in detail about rule based classification.

33.

What is back propagation? How does back propagation works?

34.

Describe in detail about support vector machines.

35.

Define associative classification.

(6)

36.

Describe about lazy learners.

37.

Describe in detail about various prediction techniques.

38.

Define entropy. What does entropy measure? Discuss. (4)

39.

Discuss the major differences between classification and clustering techniques.(4)

Download