Stat 407 Exam 2 Name 1. (5pts) Match the following methods with the descriptions. Factor Analysis Provide a simultaneous confidence region for a population mean. Model unmeasurable latent variables using a larger number of measurable variables. Group objects according to known classes. Group objects according to their similarity on variables. Summarize the variance-covariance matrix. Cluster Analysis Principal Component Analysis Classification Confidence Ellipse 2. (4pts) Write down the bivariate normal density function for a normal population with µ1 = 1, µ2 = 3, σ11 = 2, σ22 = 1, ρ12 = −0.8. 3. (5pts) Evaluate T 2 for testing Ho : µ0 = (7, 11) using the data Note that X̄ = " # 6 ,S= 10 " 2 12 8 9 6 9 8 10 # 8.0 −3.3 . −3.3 2.0 If the critical value F2,2 (0.05) = 19.0, is your value of T 2 significant at the 5% level? 1 4. Based on the following data T reat X1 X2 1 10 7 1 11 9 2 5 6 2 4 2 2 5 3 3 6 8 3 5 9 (a) (1pt) Calculate the overall mean. (b) (3pts) Calculate the group means. (c) (3pts) Given the between sums of squares (B) and within sums of squares (W) matrices below write out the MANOVA table. B= " 44.0 23.7 23.7 36.3 (d) (3pts) Calculate the Wilks’ Λ∗ . 2 # , W= " 1.67 2.17 2.17 11.2 # 5. (4pts) From the following plot estimate what the value of Wilks Λ∗ is likely to be. Explain your answer. 6. Answer true or false for the following statements. (a) (1pt) Wilks Λ∗ compares the volumes of the between and within group sums of squares matrices. T or F. (b) (1pt) Paired comparisons can be done using Hotellings T 2 statistic calculated on the differences of correpsonding pairs. T or F. (c) (1pt) Hotellings T 2 is the Euclidean distance between the hypothesized mean and the sample mean. T or F. (d) (1pt) Confidence ellipses can be used to test hypotheses about the mean. T or F. (e) (1pt) Principal components is always a good dimension reduction technique. T or F. (f) (1pt) The data in the following plot could be a sample from a bivariate normal population. T or F. (g) (1pt) An icon plot represents each case by a line trace through parallel variable axes. T or F. (h) (1pt) The quadratic discriminant rule is based on an assumption of unequal variance-covariance matrices between groups. T or F. (i) (1pt) Classification and regression trees generate decision rules based on binary splits on variables. T or F. (j) (1pt) Divisive clustering starts with all the cases in individual clusters. T or F. 3 7. Using the attached SAS output, running k-means clustering on a projection of the flea beetles data answer the following questions. (a) (4pts) Draw the initial seed means and the final cluster means on the following plot. (b) (2pts) Circle the final clusters on the plot below. (c) (2pts) How many iterations were done before convergence? (d) (2pts) How many points are in each final cluster? 4 8. (10pts) From the attached SAS output contains the results of a complete linkage hierarchical cluster analysis on a projection of a subset of size 11 from the flea beetles data. From this and the plot of the subset, draw a dendrogram summarizing the cluster algorithm results. 5 9. (5pts) Based on the following description how would you approach the analysis, and what multivariate methods would you use? Be as specific as possible. The Forest CoverType dataset: Natural resource managers responsible for developing ecosystem management strategies require basic descriptive information including inventory data for forested lands to support their decision-making processes. However, managers generally do not have this type of data for inholdings or neighboring lands that are outside their immediate jurisdiction. One method of obtaining this information is through the use of predictive models. The overall objective of this study is to predict forest cover types in undisturbed forests. The study area included four wilderness areas found in the Roosevelt National Forest of northern Colorado. A total of twelve cartographic measures were available and there were, seven major forest cover types. 6 10. (3pts) Which of the following samples is most likely to correspond to this correlation matrix " # 1 0.3 ? 0.3 1 11. (4pts) Is this two cluster solution most likely to have arisen from complete or single linkage hierarchical clustering? Why? 12. (5pts) Which of the following is not true (just one of them). Explain your answer: (a) A hypothesised mean can be inside the 95% Bonferroni confidence region but outside the 95% simultaneous confidence ellipse. (b) A hypothesised mean can be outside the 95% Bonferroni confidence region but inside the 95% simultaneous confidence ellipse. (c) A hypothesized mean can be rejected by Hotelling T 2 at the 5% significance level but be inside the 95% Bonferroni confidence region. (d) A hypothesized mean can be rejected by Hotelling T 2 at the 5% significance level but be inside the 95% simultaneous confidence ellipse. 7