Linköpings Universitet IDA/Statistik LH 732A37 Multivariate Statistical Methods, 6hp Exam in Multivariate Statistical Methods, 2011-06-03 Time allowed: Allowed aids: kl: 8-12 Calculator, The book: Johnson, Wichern: Applied Multivariate Statistical Analysis. Notes in the book and Copy of the book are allowed. Assisting teacher: Lotta Hallberg Grades: A=19-20 points, B=16-18p, C=12-15p, D=9-11p, E=6-8p Provide a detailed report that shows motivation of the results. _________________________________________________________________________________________ 1 Let [X1, X2,…, X10] be a random sample of size n=10 from an π3 (π, πΊ) population. Specify each of the following completely: i) The distribution of (πΏ5 − π)′πΊ −1 (πΏ5 − π) 1p Μ ii) The distribution of √π(πΏ − π) 1p 1 0 0 iii) The distribution of π©9πΊπ©′, where π© = ( ) 2p 0 0 1 2 Let X be π3 (π, πΊ) where π′ = [1, −1, 2] and 4 0 −1 Σ=( 0 5 0 ) −1 0 2 a) Find out if the following variables are independent: Explain 2p i) (X1, X3) and X2 ii) X1 and X1 + 3X2 - 2X3 b) Find the distribution of X1 + 3X2 - 2X3 2p 3 Below you find some statistics of the variables Sepal length; Sepal width; Petal length; Petal width from 2 different species of the Iris flower. We are interested to see if there is any difference between the mean vectors of these variables for the two species. That is if H0: π1 − π2 = π. To perform this test, the Hotellings T2-statistic shall be used. You find the questions below the outputs. Descriptive Statistics: Sepal length; Sepal width; Petal length; Petal width Variable Sepal length Species 1 2 N 50 50 Mean 5,0 5,9 StDev 0,3525 0,5162 Sepal width 1 2 50 50 3,4 2,8 0,3791 0,3138 Petal length 1 2 50 50 1,4 4,3 0,1737 0,4699 Petal width 1 2 50 50 0,2 1,3 0,1054 0,1978 1 Histogram of Sepal length; Sepal width; Petal length; Petal width Normal Sepal length; 1 Sepal length; 2 Sepal length; 1 10 5 4,4 4,8 5,6 6,0 6,4 6,8 4,4 4,8 Sepal width; 1 16 Frequency 5,2 5,2 5,6 6,0 6,4 0 6,8 5,006 0,3525 50 Sepal length; 2 Mean 5,936 0,5162 50 Sepal width; 1 Mean StDev 2,0 2,4 2,8 3,2 3,6 4,0 4,4 2,0 2,4 Petal length; 1 2,8 3,2 3,6 4,0 N 4,4 Petal length; 2 1,50 2,25 3,00 3,75 4,50 5,25 1,50 Petal width; 1 30 15 2,25 3,00 3,75 30 4,50 5,25 Petal width; 2 0 StDev N 0,3 0,6 0,9 1,2 1,5 1,8 0,0 0,3 Panel variable: Species 0,6 0,9 1,2 50 0,3138 50 Petal length; 1 Mean StDev N 1,462 0,1737 50 Petal length; 2 Mean 4,26 StDev N 0,0 3,428 0,3791 Sepal width; 2 Mean 2,77 15 0 N StDev N Sepal width; 2 8 0 Mean StDev 1,5 1,8 0,4699 50 Petal width; 1 Mean StDev N 0,246 0,1054 50 Petal width; 2 Estimated 0,124249 0,099216 0,016355 0,010331 covariance matrix, S1 (Species 1) 0,099216 0,0163551 0,0103306 0,143690 0,0116980 0,0092980 0,011698 0,0301592 0,0060694 0,009298 0,0060694 0,0111061 Estimated 0,266433 0,085184 0,182898 0,055780 covariance 0,0851837 0,0984694 0,0826531 0,0412041 matrix, S2 (Species 2) 0,182898 0,0557796 0,082653 0,0412041 0,220816 0,0731020 0,073102 0,0391061 Pooled covariance matris, Sp 0,195341 0,092200 0,099627 0,092200 0,121080 0,047176 0,099627 0,047176 0,125488 0,033055 0,025251 0,039586 Inverted Sp, invSp 11,63 -6,55 -8,00 -6,55 14,24 3,27 -8,00 3,27 21,50 3,88 -10,85 -26,66 0,0330551 0,0252510 0,0395857 0,0251061 3,88 -10,85 -26,66 87,67 a) Which assumptions have to be made to perform Hotellings T2-test? Are the assumptions fulfilled? 2p 2 b) Show how the components in the T -statistic looks like with these data. Specify the distribution. Show that the observed value of T2 is 4,29. Perform the test at 5% significance value. 3p 4 ο¦ 6 4οΆ ο·ο·. Let the random vector X ο½ ( X 1 , X 2 ) have covariance matrix ο ο½ ο§ο§ ο¨ 4 3οΈ Determine the principal components and find the proportion of the total variance of X explained by the first component. 3p 2 5 You got five variables from 14 different counties: ο· Total population (thousands) ο· Median school years ο· Total employment (thousands) ο· Health services employment (hundreds) ο· Median value homes ($10 000s) (income) Descriptive Statistics: tot pop; school; employ; helth service; income Variable tot pop school employ helth service income Mean 4,323 14,014 1,952 2,171 2,454 StDev 2,075 1,329 0,895 1,403 0,710 Minimum 1,523 12,200 0,597 0,750 1,720 Maximum 8,044 17,000 3,641 5,520 4,250 Histogram of tot pop; school; employ; helth service; income Normal tot pop school 4 4 3 3 2 2 1 1 employ tot pop Mean 4,323 StDev 2,075 N 14 3 Frequency 2 0 1 0 0 2 4 6 8 school Mean 14,01 StDev 1,329 N 14 0 , 2 ,0 , 8 , 6 , 4 ,2 , 0 , 8 1 1 12 12 1 3 14 1 5 16 1 6 helth serv ice 0 1 2 3 income 6,0 4,8 3,6 4,5 2,4 3,0 1,2 1,5 0,0 0 1 2 3 4 5 1, 0 , 5 ,0 , 5 ,0 , 5 ,0 , 5 1 2 2 3 3 4 4 Factor Analysis: tot pop; school; employ; helth service; income Maximum Likelihood Factor Analysis of the Correlation Matrix * NOTE * Heywood case Unrotated Factor Loadings and Communalities Variable tot pop school employ helth service income Variance % Var employ Mean 1,952 StDev 0,8948 N 14 helth serv ice Mean 2,171 StDev 1,403 N 14 0,0 -1 4 Factor1 0,971 0,494 1,000 0,848 -0,249 Factor2 0,160 0,833 0,000 -0,395 0,375 Communality 0,968 0,938 1,000 0,875 0,202 2,9678 0,594 1,0159 0,203 3,9837 0,797 3 income Mean 2,454 StDev 0,7102 N 14 Rotated Factor Loadings and Communalities Varimax Rotation Variable tot pop school employ helth service income Variance % Var Factor1 0,718 -0,052 0,831 0,924 -0,415 Factor2 0,673 0,967 0,556 0,143 0,173 Communality 0,968 0,938 1,000 0,875 0,202 2,2354 0,447 1,7483 0,350 3,9837 0,797 Factor Score Coefficients Variable tot pop school employ helth service income Factor1 -0,165 -0,528 1,150 0,116 -0,018 Factor2 0,246 0,789 0,080 -0,173 0,027 a) What assumptions have to be fulfilled to do the analyses above? 1p b) What do the communality measure? 1p c) Try to put names on the two factors. 1p d) One observation is: (5,935 14,2 2,265 2,27 2,91) and its standardized value is: (0,777 0,140 0,350 0,070 0,642). Calculate the two factor scores. 1p 4