Stat 407 Exam 1 Name 1. (5pts) Calculate the mean ( ¯

advertisement
Stat 407
Exam 1
Name
1. (5pts) Calculate the mean (X̄) and variance-covariance (Sn ) arrays for the following data:
x1
9
2
6
4
x2
12
8
6
6
2. (5pts) Explain what standardize a variable means? What is the purpose of standardizing variables
during multivariate analysis?
3. (5pts) Calculate the pooled variance-covariance matrix, given the two variance-covariance matrices (assume
the two sample sizes are equal):
S1 =
"
4 2
2 6
#
S2 =
"
6 −2
−2 4
#
Do you think it made sense to pool them? Explain yourself.
4. (5pts) How would you might detect an outlier using a parallel coordinate plot?
1
5. The following questions refer to measurements made on the size of the carapace and gender of painted
turtles (Jolicoeur and Mosimann, 1960). The variables are Length, Width and Height (in mm), and gender
(1 =Female, 2 =Male).
(a) (3pts) Describe the structure in the scatterplot matrix plot of the raw variables.
2
(b) (2pts) As accurately as possible, plot the point X0 = (98 81 38 1)0 on the scatterplot matrix.
(c) (3pts) How could you design a plot that would better illustrate the size differences on the physical
measurements between females and males?
(d) (2pts) When doing principal component analysis on this data, would it be better to use the covariance
matrix or the correlation matrix? Explain your answer.
(e) (3pts) From the attached SAS output, fill in the table of eigenvectors, eigenvalues, cumulative proportion of total variance, for males and females separately.
Females
Males
Variable
e1
e2
e3
e1
e2
e3
Length
Width
Height
Variance
Cum % Tot Var
3
(f) (2pts) Draw a scree plot for the females.
(g) (2pts) How many principal components would you suggest using to reduce the dimensionality of this
data (for the females only)?
(h) (2pts) Write down the value of the variance of the first principal component (of the females)?
(i) (3pts) Interpret the first principal component for the females. Is it the same interpretation for the
males?
4
6. The following questions are about a data set measured on Australian crabs. There are 200 measurements
on 2 species, both males and females, of crabs. The classes are:
Blue Crabs = 1
Orange Crabs = 2
Males = 1
Females = 2
A new class variable was created:
1=Blue Male, 2=Blue Female, 3=Orange Male, 4=Orange Female
and the variables are:
CL
CW
FL
RW
BD
=
=
=
=
=
Carapace Length
Carapace Width
Frontal Lobe
Rear Width
Body Depth
(a) (2pts) On the attached SAS output highlight (point out) the B (Between group covariance) matrix.
(b) (3pts) Explain conceptually what the between group covariance matrix is.
(c) (2pts) Linear discriminant analysis was used to build a classification rule. Write down the confusion
table for the classification rule.
(d) (2pts) Calculate the apparent error rate of the procedure.
5
(e) (2pts) Circle the points corresponding to crabs that were missclassified on the appropriate plot in the
SAS output.
(f) (2pts) From the SAS output, which group would a crab with measurements (F L = 22.2, RW =
18.0, CL = 44.0, CW = 47.5, BD = 19.1) be classified into? Which species and sex is this?
(g) (5pts) In the following plot of the crabs data in the discriminant space (not centered around the mean),
which of the points, X, Y or Z, is most likely to be the new observation? Why?
7. (5pts) Are the results from the following two procedures for building a classification rule for 3 groups likely
to differ? Explain your answer.
(a) Work pairwise to develop 3 pairwise classification rules (1 vs 2, 1 vs 3, 2 vs 3) and use this collection
of rules to classify new observations into groups.
(b) Compute the 2D discriminant space, which is the 2D projection which best separates the 3 groups.
Then build a classification rule which classifies all 3 groups with one rule (if ... then group 1, else if
... then group 2, else group 3).
6
Download