clustering

advertisement
Homework clustering lesson
CSCI 448/548 fall term 2015
Name________________________
1. The following tree was built using the UPGMA approach utilizing the provided matrix.
What are the following distances as calculated during the tree build:
A—CD
B—CD
A B C
A 0 3 5
B
0 6
C
0
D
D
6
7
3
0
AB—CD
A
B
C
.
D
Describe the above tree in nested parenthesis notation
.
2. Using the UPGMA approach, build the dendrogram associated with the following
distance matrix. Feel free to use R. You can paste the dendrogram into this document.
There are no special formatting requirements (i.e. margins, etc.)
A
B
C
D
E
A
0
B
9.5
0
C
10.5
3
0
D
9.5
6
7
0
E
10.5
7
8
3
0
The two closest nodes are D and E. Show the distance matrix after D and E have been
merged.
Show the tree in nested parenthetic notation.
Homework clustering lesson
CSCI 448/548 fall term 2015
3. Write an R script that generates 10,000 datasets randomly distributed in the Iris dataset
vector space (uniform random; i.e. each random vector is randomly generated using a
uniform random distribution for each dimension with the range of the minimum and
maximum values found in that dimension). Calculate the Cophenetic Correlation
Coefficient (CPCC) for each random dataset. Calculate the CPCC for the Iris data itself
and indicate how likely it is that a random dataset could generate such a score.
Insert the code of the R script here followed by the p-value generated (the probability
that a random dataset would yield the CPCC of the Iris data).
In order to turn-in this homework assignment, go to the Moodle course page.
1.
2.
3.
4.
Click on the link Homework: clustering lesson (upload the assignment) (Due Thursday at Midnight)
Click on the Browse button to navigate to the directory containing this document.
Highlight this file, click on open.
Click on Upload this file.
Download