Multivariate Statistics. Assessment II, Fall 2011

advertisement
Multivariate Statistics. Assessment II, Fall 2011
Students registered for ST323 do Questions 2,3,4.
MSc students and students registered for ST412 do 1,3,4.
You may use any software you like for numerical calculations of eigenvalues, eigenvectors, inverse of matrices, correlation matrices, Q-plots and scatter plots, or any
other numerical calculation necessary. If you do so, you will need to describe what
software you used and also attach the outputs. One useful statistical software can
be Minitab, which you may download from
http://www2.warwick.ac.uk/services/its/servicessupport/software/list
Solutions are due to January 14, noon.
Question 1. A data set of n = 276 measurements on skull and bone size of
white fowl. Each measurement has six components : skull length, skull breadth,
femur length, tibia length,humerus length and ulna length, recorded as a vector
(X1 , X2 , X3 , X4 , X5 , X6 ) in six space dimensions. The sample correlation matrix is
constructed and given by


1.000 0.505 0.569 0.602 0.621 0.603
1.000 0.422 0.467 0.482 0.450 



1.000 0.926 0.877 0.878 

,

1.000 0.874 0.894 


1.000 0.937 
1.000
where the lower triangle is left empty as the matrix is symmetric.
A. Give the principal component solution for the factor problem with two factors.
B. Is the choice of only two factors well justified ? Or would you choose a different
number of factors.
C. What is the proportion of the total variance due to each one of the two factors.
D. Compute the residual matrix and based on it comment on the efficiency of the
factor solution you provided.
Question 2. Consider the covariance matrix


11.072
 8.019 6.417
,
8.160 6.005 6.773
where the upper triangle is left empty as the matrix is symmetric.
A. Give the principal component solution for the factor problem with one factor.
B. Is the choice of only one factor well justified ? Or would you choose a different
number of factors.
C. What is the proportion of the total variance due that factor ?
1
2
D. Compute the residual matrix and based on it comment on the efficiency of the
factor solution you provided.
Question 3. Consider the data table
Individual
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
X1
3.7
5.7
3.8
3.2
3.1
4.6
2.4
7.2
6.7
5.4
3.9
4.5
3.5
4.5
1.5
8.5
4.5
6.5
4.1
5.5
X2
48.5
65.1
47.2
53.2
55.5
36.1
24.8
33.1
47.4
54.1
36.9
58.8
27.8
40.2
13.5
56.4
71.6
52.8
44.1
40.9
X3
9.3
8.0
10.9
12.0
9.7
7.9
14.0
7.6
8.5
11.3
12.7
12.3
9.8
8.4
10.1
7.1
8.2
10.9
11.2
9.4
A. Construct normal probability plots (Q-plots) for each observation. Construct
the pairwise scatter plots. Does the assumption of multivariate normality seem to
be justified ?
B. Determine the axes of the 90% confidence ellipsoid for µ. Determine the lengths
of these axes.
Question 4. Consider two normal populations π1 , π2 with means µT1 = (10, 15), µT2 =
(10, 25) and covariance matrices
18 12
Σ1 =
12 32
and
Σ2 =
20 −7
−7 5
.
3
Assume equal prior probabilities and misclassification costs of c(2|1) = 10 and
c(1|2) = 73.89
A. Write down explicitly the posterior probabilities P(π1 |x) and P(π2 |x) for an
observation x.
B. Classify the set of observations (10, 15)T , (12, 17)T , (14, 19)T , (16, 21)T , (18, 23)T
Download