Multivariate Statistics. Assessment II, Fall 2011 Students registered for ST323 do Questions 2,3,4. MSc students and students registered for ST412 do 1,3,4. You may use any software you like for numerical calculations of eigenvalues, eigenvectors, inverse of matrices, correlation matrices, Q-plots and scatter plots, or any other numerical calculation necessary. If you do so, you will need to describe what software you used and also attach the outputs. One useful statistical software can be Minitab, which you may download from http://www2.warwick.ac.uk/services/its/servicessupport/software/list Solutions are due to January 14, noon. Question 1. A data set of n = 276 measurements on skull and bone size of white fowl. Each measurement has six components : skull length, skull breadth, femur length, tibia length,humerus length and ulna length, recorded as a vector (X1 , X2 , X3 , X4 , X5 , X6 ) in six space dimensions. The sample correlation matrix is constructed and given by 1.000 0.505 0.569 0.602 0.621 0.603 1.000 0.422 0.467 0.482 0.450 1.000 0.926 0.877 0.878 , 1.000 0.874 0.894 1.000 0.937 1.000 where the lower triangle is left empty as the matrix is symmetric. A. Give the principal component solution for the factor problem with two factors. B. Is the choice of only two factors well justified ? Or would you choose a different number of factors. C. What is the proportion of the total variance due to each one of the two factors. D. Compute the residual matrix and based on it comment on the efficiency of the factor solution you provided. Question 2. Consider the covariance matrix 11.072 8.019 6.417 , 8.160 6.005 6.773 where the upper triangle is left empty as the matrix is symmetric. A. Give the principal component solution for the factor problem with one factor. B. Is the choice of only one factor well justified ? Or would you choose a different number of factors. C. What is the proportion of the total variance due that factor ? 1 2 D. Compute the residual matrix and based on it comment on the efficiency of the factor solution you provided. Question 3. Consider the data table Individual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 X1 3.7 5.7 3.8 3.2 3.1 4.6 2.4 7.2 6.7 5.4 3.9 4.5 3.5 4.5 1.5 8.5 4.5 6.5 4.1 5.5 X2 48.5 65.1 47.2 53.2 55.5 36.1 24.8 33.1 47.4 54.1 36.9 58.8 27.8 40.2 13.5 56.4 71.6 52.8 44.1 40.9 X3 9.3 8.0 10.9 12.0 9.7 7.9 14.0 7.6 8.5 11.3 12.7 12.3 9.8 8.4 10.1 7.1 8.2 10.9 11.2 9.4 A. Construct normal probability plots (Q-plots) for each observation. Construct the pairwise scatter plots. Does the assumption of multivariate normality seem to be justified ? B. Determine the axes of the 90% confidence ellipsoid for µ. Determine the lengths of these axes. Question 4. Consider two normal populations π1 , π2 with means µT1 = (10, 15), µT2 = (10, 25) and covariance matrices 18 12 Σ1 = 12 32 and Σ2 = 20 −7 −7 5 . 3 Assume equal prior probabilities and misclassification costs of c(2|1) = 10 and c(1|2) = 73.89 A. Write down explicitly the posterior probabilities P(π1 |x) and P(π2 |x) for an observation x. B. Classify the set of observations (10, 15)T , (12, 17)T , (14, 19)T , (16, 21)T , (18, 23)T