Linköpings Universitet IDA/Statistik LH 732A37 Multivariate Statistical Methods, 6hp Exam in Multivariate Statistical Methods, 2011-03-26 Time allowed: Allowed aids: kl: 8-12 Calculator, The book: Johnson, Wichern: Applied Multivariate Statistical Analysis. Notes in the book and Copy of the book are allowed. Assisting teacher: Lotta Hallberg Grades: A=19-20 points, B=16-18p, C=12-15p, D=9-11p, E=6-8p Provide a detailed report that shows motivation of the results. _________________________________________________________________________________________ 1 You are given the random vector X’=[X1, X2, X3] with mean vector 𝝁′𝑋 = [3, 2, −2] and covariance-matrix 2 0 0 Σ𝑋 = (0 2 0) 0 0 2 Let 1 −1 0 𝐴=( ) 1 1 −2 Let further Y=AX. a) Find E[Y] and Var[Y]. 2p b) Calculate the total variance and the generalized variance of X and of Y. 1p 2 Let X be 𝑁3 (𝝁, 𝚺) where 𝛍′ = [1, −1, 2] and 4 0 −1 Σ=( 0 5 0 ) −1 0 2 a) Find out if the following variables are independent: Explain 2p i) (X1, X3) and X2 ii) X1 and X1 + 3X2 - 2X3 b) Find the distribution of X1 + 3X2 - 2X3 2p 3 Measurements of x1=stiffness and x2=bending strength of a sample of n=30 pieces of a particular grade of lumber shall be analyzed. Below you find some statistics and graphs: Variable Stiffness Bending Mean 1860,5 8354 StDev 352,2 1867 Minimum 1115,0 4175 Estimated covariance matris S 124055 361620 361620 3486333 Invers of S 0,0000116 -0,0000012 -0,0000012 0,0000004 Eigenvalues of S 3524786 85601 Maximum 2540,0 12090 Eigenvectors of S in columns of matrix P 0,105740 0,994394 0,994394 -0,105740 Scatterplot of Stiffness vs Bending 2600 2400 Stiffness 2200 2000 1800 1600 1400 1200 1000 4000 5000 6000 7000 8000 9000 Bending 10000 11000 12000 Histogram of Stiffness Histogram of Bending Normal Normal 9 7 7 6 6 5 4 4 3 2 2 1 1 6000 8000 Bending 10000 12000 1861 352,2 30 5 3 4000 Mean StDev N 8 Frequency Frequency 9 Mean 8354 StDev 1867 N 30 8 0 13000 0 1200 1400 1600 1800 2000 Stiffness 2200 2400 2600 a) Use the three graphs and determine if it is reasonable to assume normality. Explain. 1p 2 b) Test, using Hotellings T , if 𝜇1 = 2000 and 𝜇2 = 10 000 are plausible values of the mean of stiffness and blending. Assume normality. Significance value 5%. 3p c) Calculate the two Bonferroni confidence intervals with simultaneous confidence level of 95%. 2p 4 5 2 . Let the random vector X ( X 1 , X 2 ) have covariance matrix 2 2 Determine the principal components and find the proportion of the total variance of X explained by the first component. 3p 5 You got five variables from 14 different counties: Total population (thousands) Median school years Total employment (thousands) Health services employment (hundreds) Median value homes ($10 000s) (income) Descriptive Statistics: tot pop; school; employ; helth service; income Variable tot pop school employ helth service income Mean 4,323 14,014 1,952 2,171 2,454 StDev 2,075 1,329 0,895 1,403 0,710 Minimum 1,523 12,200 0,597 0,750 1,720 Maximum 8,044 17,000 3,641 5,520 4,250 Histogram of tot pop; school; employ; helth service; income Normal tot pop school 4 4 3 3 2 2 1 1 employ tot pop Mean 4,323 StDev 2,075 N 14 3 Frequency 2 0 1 0 0 2 4 6 8 school Mean 14,01 StDev 1,329 N 14 0 , 2 ,0 , 8 , 6 , 4 ,2 , 0 , 8 1 1 12 12 1 3 14 1 5 16 1 6 helth serv ice 0 1 2 3 income 6,0 4,8 3,6 4,5 2,4 3,0 1,2 1,5 0,0 0 1 2 3 4 5 1, 0 , 5 ,0 , 5 ,0 , 5 ,0 , 5 1 2 2 3 3 4 4 Factor Analysis: tot pop; school; employ; helth service; income Maximum Likelihood Factor Analysis of the Correlation Matrix * NOTE * Heywood case Unrotated Factor Loadings and Communalities Variable tot pop school employ helth service income Variance % Var Factor1 0,971 0,494 1,000 0,848 -0,249 Factor2 0,160 0,833 0,000 -0,395 0,375 Communality 0,968 0,938 1,000 0,875 0,202 2,9678 0,594 1,0159 0,203 3,9837 0,797 Rotated Factor Loadings and Communalities Varimax Rotation Variable tot pop school employ helth service income Variance % Var employ Mean 1,952 StDev 0,8948 N 14 helth serv ice Mean 2,171 StDev 1,403 N 14 0,0 -1 4 Factor1 0,718 -0,052 0,831 0,924 -0,415 Factor2 0,673 0,967 0,556 0,143 0,173 Communality 0,968 0,938 1,000 0,875 0,202 2,2354 0,447 1,7483 0,350 3,9837 0,797 income Mean 2,454 StDev 0,7102 N 14 Factor Score Coefficients Variable tot pop school employ helth service income Factor1 -0,165 -0,528 1,150 0,116 -0,018 Factor2 0,246 0,789 0,080 -0,173 0,027 a) What assumptions have to be fulfilled to do the analyses above? 1p b) What do the communality measure? 1p c) Try to put names on the two factors. 1p d) One observation is: (5,935 14,2 2,265 2,27 2,91) and its standardized value is: (0,777 0,140 0,350 0,070 0,642). Calculate the two factor scores. 1p