Exam in Multivariate Statistical Methods, 2011-06-03

advertisement
Linköpings Universitet
IDA/Statistik
LH
732A37 Multivariate Statistical Methods, 6hp
Exam in Multivariate Statistical Methods, 2011-06-03
Time allowed:
Allowed aids:
kl: 8-12
Calculator, The book: Johnson, Wichern: Applied Multivariate
Statistical Analysis. Notes in the book and Copy of the book are
allowed.
Assisting teacher: Lotta Hallberg
Grades:
A=19-20 points, B=16-18p, C=12-15p, D=9-11p, E=6-8p
Provide a detailed report that shows motivation of the results.
_________________________________________________________________________________________
1
Let [X1, X2,…, X10] be a random sample of size n=10 from an 𝑁3 (𝝁, 𝚺)
population. Specify each of the following completely:
i)
The distribution of (𝑿5 − 𝝁)′𝚺 −1 (𝑿5 − 𝝁)
1p
Μ…
ii)
The distribution of √𝑛(𝑿 − 𝝁)
1p
1 0 0
iii)
The distribution of 𝑩9𝑺𝑩′, where 𝑩 = (
)
2p
0 0 1
2
Let X be 𝑁3 (𝝁, 𝚺) where 𝛍′ = [1, −1, 2] and
4 0 −1
Σ=( 0 5 0 )
−1 0 2
a) Find out if the following variables are independent: Explain 2p
i)
(X1, X3) and X2
ii)
X1 and X1 + 3X2 - 2X3
b) Find the distribution of X1 + 3X2 - 2X3
2p
3
Below you find some statistics of the variables Sepal length; Sepal width; Petal
length; Petal width from 2 different species of the Iris flower. We are interested
to see if there is any difference between the mean vectors of these variables
for the two species. That is if H0: 𝝁1 − 𝝁2 = 𝟎. To perform this test, the
Hotellings T2-statistic shall be used. You find the questions below the
outputs.
Descriptive Statistics: Sepal length; Sepal width; Petal length; Petal width
Variable
Sepal length
Species
1
2
N
50
50
Mean
5,0
5,9
StDev
0,3525
0,5162
Sepal width
1
2
50
50
3,4
2,8
0,3791
0,3138
Petal length
1
2
50
50
1,4
4,3
0,1737
0,4699
Petal width
1
2
50
50
0,2
1,3
0,1054
0,1978
1
Histogram of Sepal length; Sepal width; Petal length; Petal width
Normal
Sepal length; 1
Sepal length; 2
Sepal length; 1
10
5
4,4
4,8
5,6
6,0
6,4
6,8
4,4
4,8
Sepal width; 1
16
Frequency
5,2
5,2
5,6
6,0
6,4
0
6,8
5,006
0,3525
50
Sepal length; 2
Mean
5,936
0,5162
50
Sepal width; 1
Mean
StDev
2,0
2,4
2,8
3,2
3,6
4,0
4,4
2,0
2,4
Petal length; 1
2,8
3,2
3,6
4,0
N
4,4
Petal length; 2
1,50
2,25
3,00
3,75
4,50
5,25
1,50
Petal width; 1
30
15
2,25
3,00
3,75
30
4,50
5,25
Petal width; 2
0
StDev
N
0,3
0,6
0,9
1,2
1,5
1,8 0,0
0,3
Panel variable: Species
0,6
0,9
1,2
50
0,3138
50
Petal length; 1
Mean
StDev
N
1,462
0,1737
50
Petal length; 2
Mean
4,26
StDev
N
0,0
3,428
0,3791
Sepal width; 2
Mean
2,77
15
0
N
StDev
N
Sepal width; 2
8
0
Mean
StDev
1,5
1,8
0,4699
50
Petal width; 1
Mean
StDev
N
0,246
0,1054
50
Petal width; 2
Estimated
0,124249
0,099216
0,016355
0,010331
covariance matrix, S1 (Species 1)
0,099216 0,0163551 0,0103306
0,143690 0,0116980 0,0092980
0,011698 0,0301592 0,0060694
0,009298 0,0060694 0,0111061
Estimated
0,266433
0,085184
0,182898
0,055780
covariance
0,0851837
0,0984694
0,0826531
0,0412041
matrix, S2 (Species 2)
0,182898 0,0557796
0,082653 0,0412041
0,220816 0,0731020
0,073102 0,0391061
Pooled covariance matris, Sp
0,195341 0,092200 0,099627
0,092200 0,121080 0,047176
0,099627 0,047176 0,125488
0,033055 0,025251 0,039586
Inverted Sp, invSp
11,63
-6,55
-8,00
-6,55
14,24
3,27
-8,00
3,27
21,50
3,88 -10,85 -26,66
0,0330551
0,0252510
0,0395857
0,0251061
3,88
-10,85
-26,66
87,67
a) Which assumptions have to be made to perform Hotellings T2-test?
Are the assumptions fulfilled?
2p
2
b) Show how the components in the T -statistic looks like with these
data. Specify the distribution. Show that the observed value of T2 is
4,29. Perform the test at 5% significance value.
3p
4
 6 4οƒΆ
οƒ·οƒ·.
Let the random vector X ο€½ ( X 1 , X 2 ) have covariance matrix  ο€½ 
 4 3οƒΈ
Determine the principal components and find the proportion of the total
variance of X explained by the first component.
3p
2
5
You got five variables from 14 different counties:
ο‚· Total population (thousands)
ο‚· Median school years
ο‚· Total employment (thousands)
ο‚· Health services employment (hundreds)
ο‚· Median value homes ($10 000s) (income)
Descriptive Statistics: tot pop; school; employ; helth service; income
Variable
tot pop
school
employ
helth service
income
Mean
4,323
14,014
1,952
2,171
2,454
StDev
2,075
1,329
0,895
1,403
0,710
Minimum
1,523
12,200
0,597
0,750
1,720
Maximum
8,044
17,000
3,641
5,520
4,250
Histogram of tot pop; school; employ; helth service; income
Normal
tot pop
school
4
4
3
3
2
2
1
1
employ
tot pop
Mean 4,323
StDev 2,075
N
14
3
Frequency
2
0
1
0
0
2
4
6
8
school
Mean 14,01
StDev 1,329
N
14
0
, 2 ,0 , 8 , 6 , 4 ,2 , 0 , 8
1 1 12 12 1 3 14 1 5 16 1 6
helth serv ice
0
1
2
3
income
6,0
4,8
3,6
4,5
2,4
3,0
1,2
1,5
0,0
0
1
2
3
4
5
1,
0 , 5 ,0 , 5 ,0 , 5 ,0 , 5
1 2 2 3 3 4 4
Factor Analysis: tot pop; school; employ; helth service; income
Maximum Likelihood Factor Analysis of the Correlation Matrix
* NOTE * Heywood case
Unrotated Factor Loadings and Communalities
Variable
tot pop
school
employ
helth service
income
Variance
% Var
employ
Mean
1,952
StDev 0,8948
N
14
helth serv ice
Mean 2,171
StDev 1,403
N
14
0,0
-1
4
Factor1
0,971
0,494
1,000
0,848
-0,249
Factor2
0,160
0,833
0,000
-0,395
0,375
Communality
0,968
0,938
1,000
0,875
0,202
2,9678
0,594
1,0159
0,203
3,9837
0,797
3
income
Mean
2,454
StDev 0,7102
N
14
Rotated Factor Loadings and Communalities
Varimax Rotation
Variable
tot pop
school
employ
helth service
income
Variance
% Var
Factor1
0,718
-0,052
0,831
0,924
-0,415
Factor2
0,673
0,967
0,556
0,143
0,173
Communality
0,968
0,938
1,000
0,875
0,202
2,2354
0,447
1,7483
0,350
3,9837
0,797
Factor Score Coefficients
Variable
tot pop
school
employ
helth service
income
Factor1
-0,165
-0,528
1,150
0,116
-0,018
Factor2
0,246
0,789
0,080
-0,173
0,027
a) What assumptions have to be fulfilled to do the analyses above?
1p
b) What do the communality measure?
1p
c) Try to put names on the two factors.
1p
d) One observation is: (5,935 14,2 2,265 2,27 2,91)
and its standardized value is: (0,777 0,140 0,350 0,070 0,642).
Calculate the two factor scores.
1p
4
Download