Stat 407 Exam 1 Name 1. (5pts) Calculate the mean (X̄) and variance-covariance (Sn ) arrays for the following data: x1 9 2 6 4 x2 12 8 6 6 2. (5pts) Explain what standardize a variable means? What is the purpose of standardizing variables during multivariate analysis? 3. (5pts) Calculate the pooled variance-covariance matrix, given the two variance-covariance matrices (assume the two sample sizes are equal): S1 = " 4 2 2 6 # S2 = " 6 −2 −2 4 # Do you think it made sense to pool them? Explain yourself. 4. (5pts) How would you might detect an outlier using a parallel coordinate plot? 1 5. The following questions refer to measurements made on the size of the carapace and gender of painted turtles (Jolicoeur and Mosimann, 1960). The variables are Length, Width and Height (in mm), and gender (1 =Female, 2 =Male). (a) (3pts) Describe the structure in the scatterplot matrix plot of the raw variables. 2 (b) (2pts) As accurately as possible, plot the point X0 = (98 81 38 1)0 on the scatterplot matrix. (c) (3pts) How could you design a plot that would better illustrate the size differences on the physical measurements between females and males? (d) (2pts) When doing principal component analysis on this data, would it be better to use the covariance matrix or the correlation matrix? Explain your answer. (e) (3pts) From the attached SAS output, fill in the table of eigenvectors, eigenvalues, cumulative proportion of total variance, for males and females separately. Females Males Variable e1 e2 e3 e1 e2 e3 Length Width Height Variance Cum % Tot Var 3 (f) (2pts) Draw a scree plot for the females. (g) (2pts) How many principal components would you suggest using to reduce the dimensionality of this data (for the females only)? (h) (2pts) Write down the value of the variance of the first principal component (of the females)? (i) (3pts) Interpret the first principal component for the females. Is it the same interpretation for the males? 4 6. The following questions are about a data set measured on Australian crabs. There are 200 measurements on 2 species, both males and females, of crabs. The classes are: Blue Crabs = 1 Orange Crabs = 2 Males = 1 Females = 2 A new class variable was created: 1=Blue Male, 2=Blue Female, 3=Orange Male, 4=Orange Female and the variables are: CL CW FL RW BD = = = = = Carapace Length Carapace Width Frontal Lobe Rear Width Body Depth (a) (2pts) On the attached SAS output highlight (point out) the B (Between group covariance) matrix. (b) (3pts) Explain conceptually what the between group covariance matrix is. (c) (2pts) Linear discriminant analysis was used to build a classification rule. Write down the confusion table for the classification rule. (d) (2pts) Calculate the apparent error rate of the procedure. 5 (e) (2pts) Circle the points corresponding to crabs that were missclassified on the appropriate plot in the SAS output. (f) (2pts) From the SAS output, which group would a crab with measurements (F L = 22.2, RW = 18.0, CL = 44.0, CW = 47.5, BD = 19.1) be classified into? Which species and sex is this? (g) (5pts) In the following plot of the crabs data in the discriminant space (not centered around the mean), which of the points, X, Y or Z, is most likely to be the new observation? Why? 7. (5pts) Are the results from the following two procedures for building a classification rule for 3 groups likely to differ? Explain your answer. (a) Work pairwise to develop 3 pairwise classification rules (1 vs 2, 1 vs 3, 2 vs 3) and use this collection of rules to classify new observations into groups. (b) Compute the 2D discriminant space, which is the 2D projection which best separates the 3 groups. Then build a classification rule which classifies all 3 groups with one rule (if ... then group 1, else if ... then group 2, else group 3). 6