#7.26 a) For each of the response variables, I first fit main... insignificant variables. The models I finally chose are as followings:

advertisement
#7.26
a)
For each of the response variables, I first fit main effects models and then deleted those
insignificant variables. The models I finally chose are as followings:
Y1:
Y2:
Y3:
Y4:
The residual plots and leverage plots are shown below. There appear to be potential
negative outliers in the high fitted values of all four models, which are observations 51, 52, and
56. From the leverage plots, I found observations 60 and 61 are influential points.
Finally, a prediction interval for the new observation with the given values of the covariates
is (1.26, 4.4).
b)
I fit the main effects model to the multivariate multiple linear regression, and the matrix of
estimated coefficients and estimated error covariance matrix are shown below:
The residual plots for the four response variables are shown below. There are high potential
outliers in the high fitted values of the four variables.
The simultaneous confidence intervals for a new observation at the specified values of the
covariates are as followings:
The interval for SF is wider than the one calculated in 26a, which should be expected since these
are simultaneous prediction intervals, requiring to be wider to maintain the same coverage.
#7.27
Type II MANOVA Tests:
Sum of squares and products for error:
Assessment implementation
assessment
1.128
0.5730
implementation
0.573
2.0455
Term: severity
Sum of squares and products for the hypothesis:
Assessment implementation
assessment
4.68075
implementation 12.48200
12.48200
33.28533
Multivariate Tests: severity
Df test stat approx F num Df den Df
Pillai
Wilks
1 0.943124 49.7464
1 0.056876 49.7464
2
6 0.00018399 ***
2
Hotelling-Lawley 1 16.582133 49.7464
Roy
1 16.582133 49.7464
Pr(>F)
2
6 0.00018399 ***
2
6 0.00018399 ***
6 0.00018399 ***
Term: complexity
Sum of squares and products for the hypothesis:
assessment implementation
assessment
37.07408
66.53325
implementation 66.53325
119.40075
Multivariate Tests: complexity
Df test stat approx F num Df den Df
Pillai
1 0.98548 203.5457
Wilks
2
1 0.01452 203.5457
6 3.0642e-06 ***
2
Hotelling-Lawley 1 67.84857 203.5457
Roy
1 67.84857 203.5457
Pr(>F)
2
6 3.0642e-06 ***
2
6 3.0642e-06 ***
6 3.0642e-06 ***
Term: experience
Sum of squares and products for the hypothesis:
assessment implementation
assessment
11.532
implementation
26.257
26.25700
59.78408
Multivariate Tests: experience
Df test stat approx F num Df den Df
Pillai
Wilks
1 0.968544 92.37208
1 0.031456 92.37208
2
6 3.1124e-05 ***
2
Hotelling-Lawley 1 30.790694 92.37208
Roy
1 30.790694 92.37208
Pr(>F)
2
6 3.1124e-05 ***
2
6 3.1124e-05 ***
6 3.1124e-05 ***
Term: severity:complexity
Sum of squares and products for the hypothesis:
assessment implementation
assessment
1.458
implementation
2.3490
2.349
3.7845
Multivariate Tests: severity:complexity
Df test stat approx F num Df den Df Pr(>F)
Pillai
1 0.6973225 6.91154
Wilks
2
1 0.3026775 6.91154
6 0.027729 *
2
6 0.027729 *
Hotelling-Lawley 1 2.3038466 6.91154
Roy
1 2.3038466 6.91154
2
2
6 0.027729 *
6 0.027729 *
Term: complexity:experience
Sum of squares and products for the hypothesis:
assessment implementation
assessment
0.512
implementation
0.176
0.1760
0.0605
Multivariate Tests: complexity:experience
Df test stat approx F num Df den Df Pr(>F)
Pillai
1 0.3158987 1.385315
Wilks
1 0.6841013 1.385315
2
6 0.32016
2
Hotelling-Lawley 1 0.4617718 1.385315
Roy
1 0.4617718 1.385315
2
6 0.32016
2
6 0.32016
6 0.32016
Term: severity:experience
Sum of squares and products for the hypothesis:
assessment implementation
assessment
implementation
0.20
0.29
0.2900
0.4205
Multivariate Tests: severity:experience
Df test stat approx F num Df den Df Pr(>F)
Pillai
Wilks
1 0.2178130 0.8353998
1 0.7821870 0.8353998
2
2
Hotelling-Lawley 1 0.2784666 0.8353998
Roy
1 0.2784666 0.8353998
2
6 0.47855
6 0.47855
2
6 0.47855
6 0.47855
#8.10
a)
The sample covariance matrix S is shown below:
The sample principle components are
b)
The first principal component vector explains 52.9% of the total sample variance, the second
explains 27.1% of the total sample variance, and the third explains 9.8% of the total sample
variance. The first component is the negative weighted sum of all five variables, with the
greatest weight on Royal and Exxon Mobil. The second component is a contrast between the
first three stocks--Morgan, Citibank, Wells Fargo and the last two--Royal Dutch Shell and Exxon
Mobil. The third is a contrast between Morgan, Exxon Mobil and Cibibank, Well Fargo, Royal
Dutch Shell.
c)
The 90% confidence intervals for the three variances of the population components are:
d) Since 89.88% of the total sample variance can be explained by the first three principal
components, I believe that the stock data can be summarized in three dimensions rather than
five dimensions without much loss of information.
#8.18
a)
Shown below are the sample correlation matrix, the eigenvalues, and the eigenvectors:
R:
b)
The first two principle components are:
The cumulative percentages of the total variance explained by the first two principle
component vectors of the standardized data are 82.9% and 92.3%, respectively. Shown below
are correlations between the two component vectors and the standardized variables.
100m
200m
400m
800m 1500m 3000m Marathon
PC1 0.888 0.880
0.919
0.927
0.938
0.937
0.884
PC2 0.396 0.434
0.199
-0.126 -0.291
-0.281
-0.298
c) The first component is the sum of all seven variables, with relatively equal weights. So it
might measure each country’s athletic ability. The second component is a contrast between the
first three distances and the last four distances. So it might measure the relative strength of
countries in the various running distances.
d) The top ten countries according to the first principal component are GDR, USSR, USA, Czech
Republic, FRG, GBNI, Poland, Canada, Finland, and Italy. The bottom ten countries are Singapore,
Indonesia, Dominican Republic, Malaysia, Costa Rica, Guatemala, Papua New Guinea, Mauritius,
Cook Islands, and Western Samoa. I am not surprised by the rankings according to the first
component vector.
#8.19
Perform a principal components analysis using the covariance matrix S of the speed data.
The cumulative percentages of the total variance explained by the first two principal
component vectors of the data are 83.7% and 92.1%, respectively. Correlations between the
two component vectors and the standardized variables are shown below. Since the patterns of
correlations are basically the same as in the previous problems, the interpretations of the first
two principal component vectors can remain the same.
100m
PC1
200m 400m 800m 1500m 3000m Marathon
0.869 0.864 0.893 0.917 0.947
0.949 0.927
PC2 -0.405 -0.448 -0.319 -0.008 0.154
0.191 0.281
The rank of the nations on the basis of their score on the first principal component is below
and it is similar to the rank of the nations in problem 8.18.
I prefer the first analysis in 8.18. Since the performances at different running distances are
not comparable, it is better standardize the variables.
#8.28
a)
Scatterplots of the two pairs of variables specifies are shown below. Based on these
scatterplots, I removed the four outliers (observations 25,34,69,72 ) from the dataset.
b) Based on the cumulative proportion of variance explained by the principal component vectors,
which are (0.465 0.625 0.745 0.833 0.900 0.941 0.968 0.987 1.000), and the screeplot shown
below, I would like choose to summarize this dataset with the first five principal component
vectors.
c) The first component vector appears to be farm size component, as it has positive correlation
with all the counting variables, and approximately zero correlation with the distance variable.
The second component vector might be differentiating between families that focus more on
crops versus ones that focus more on livestock, as it is positively correlated with cattle and goats
and negatively correlated with cotton and maize. The third component vector is most directly
related to the distance. The fourth one is positively related to millet, and negatively related to
distance to road and cattle. The fifth one might be distinguishing between families that raise
cattle versus those that raise goats.
Download