EXAM 732A02 Data Mining – Clustering and Association Analysis April 29, 1-5pm

advertisement
Institutionen för datavetenskap
Linköpings universitet
EXAM
732A02 Data Mining –
Clustering and Association Analysis
April 29, 1-5pm
Teachers: Patrick Lambrix, José M Pena
Grades: Requirement for grade C: 15 points. Requirement for grade A: 23 points
Instructions:
• Start each question at a new page.
• Write at one side of a page.
• Add name and personal number on each page.
• Write clearly.
• If you make assumptions about a question, that are not explicitly stated, you need to write
these down. (These assumptions cannot change the exercise or question.)
Help: dictionary
GOOD LUCK!
1(4)
1. Apriori algoritm (3p+2p=5p)
a. Run the Aprori algorithm on the following transaction database with minimum support
equal to 2 transactions. Explain step by step the execution.
Transaction id
Items
1
A, B, C
2
A, B, C
3
C, D
4
C, D
b. Sketch a proof of the correctness of the Apriori algorithm. It suffices to sketch a proof
showing that all the frequent itemsets of size k are among the candidates of size k.
2. FP algorithm (2p+1p=3p)
a. Run the FP algorithm on the transaction database in exercise 1 with minimum support
equal to 2 transactions. Explain step by step the execution.
b. Discuss the advantages and disadvantages of the FP algorithm with respect to the
Apriori algorithm.
2(4)
3. Constraints (2p+2p+2p=6p)
a. Run the Apriori algorithm on the transaction database in exercise 1 with minimum
support equal to 2 transactions and the constraint that the sum of the prices of the
items in an itemset must be greater than 1 (do not simply run the algorithm and
afterwards consider the constraint but incorporate the constraint into the algorithm).
Explain step by step the execution.
Item
Price
A
1
B
2
C
1
D
1
b. Run the Apriori algorithm on the transaction database in exercise 1 with minimum
support equal to 2 transactions and the constraint that the sum of the prices of the
items in an itemset must be equal or smaller than 2 (do not simply run the algorithm
and afterwards consider the constraint but incorporate the constraint into the
algorithm). Explain step by step the execution.
c. Give an example of a convertible monotone constraint that is not monotone. Give an
example of a convertible antimonotone constraint that is not antimonotone. How
would you incorporate convertible monotone and antimonotone constraints into the
Apriori algorithm ?
4. Clustering by Partitioning (4p+1p=5p)
a. Describe the principles and ideas regarding PAM. Explain the different steps of the
algorithm.
b. Given the graph representation of the clustering problem where a node represents k
mediods (and thus a potential solution for the clustering), and nodes are neighbors if
the sets of objects represented by the nodes differ by one object. Finding a solution for
the clustering problem can then be seen as a search in the graph.
What is the difference between PAM and CLARANS regarding searching this graph?
3(4)
5. Hierarchical clustering (4p)
Describe the principles and ideas regarding Agglomorative Hierarchical Clustering.
Show the different steps of the algorithm using the dissimilarity matrix below
and complete link clustering. Give partial results after each step.
| 1
2 3 4 5
----------------------------------1 | 0
2 | 2
0
3 | 4
3 0
4 | 10
7 9 0
5 | 8
5 6 1 0
6. Density and grid-based clustering (3p)
Describe the principles and ideas regarding the CLIQUE algorithm.
What is the main purpose of the algorithm? For what kind of purpose would you use this
algorithm? Explain the major steps. What are the strenghts and weaknesses of the algorithm?
7. Potpourri (2p+1p+1p=4p)
a. Given the following two objects.
- What is the distance between the objects if all variables are symmetric?
- What is the distance between the objects if all variables are asymmetric?
Object1
Object2
Attribute1
1
1
Attribute2 Attribute3 Attribute4
1
0
0
0
1
0
b. Give an example of an ordinal variable. Give an example of a nominal variable.
c. When is a point p directly density-reachable from a point q w.r.t Eps and
MinPts?
4(4)
Download