STAT 511 Assignment 5 Name

advertisement
STAT 511
Assignment 5
Name ___________________
Spring 2001
Reading Assignment: Rencher, Chapter 15
Engel, The Analysis of Unbalanced Linear Models with Variance
Components (posted in the course web page)
Neter, Kutner, Nachtsheim & Wasserman, Applied Linear Statistical
Models, 4th Edition, Chapters 13, 24, 28, 29.
Written Assignment: Due Friday, April 6, in class.
1. Analyze the data from Problem 3 on Assignment 4, assuming that the block effects
corresponding to the sources of flour are random effects. Assume the model
Yijk = µ + αi + βj + γij + τk + eijk
where Yijk is the observed bread volume of 10 loaves of bread made with the i-th fat and the
j-th surfactant with flour from the k-th batch of flour. Here, we will consider the batches of
flour as samples from the possible batches of flour that could be delivered to this bakery, and
assume that τk ~ NID(0, σ τ2 ) and eijk ~ NID(0, σ 2e ) are independent sets of random effects.
A. Obtain REML estimates of the variance components σ τ2 and σ 2e .
B. Use the Type III sums of squares provided by PROC GLM in SAS to test the null
hypothesis of no interaction between the fat and surfactant factors. Show the value of
your test statistic, its degrees of freedom and the resulting p-value. State your conclusion.
C. Use the Type III sums of squares results from the PROC MIXED output to test the null
hypothesis of no interaction between the fat and surfactant factors.
D. Because of the imbalance caused by missing data the test statistics are not the same in
parts B and C. Which is the more appropriate test to use? Explain.
E. Use the "slice" option and the LSMEANS option in PROC MIXED to examine
differences in means volumes for the three fats for each surfactant. Does the same fat
produce the highest mean volume for each surfactant? Consider random variation in the
estimators for the means in answering this question.
F.
Predict the values of the random effects of the four sources of flour selected for this
experiment. Which sources of flour are superior with respect to providing the largest
mean bread volumes? Justify your answer.
2
2. Four plants of the same variety were randomly samples from a large field of plants. Three
leaves were randomly selected from each plant and three separate determinations of the
concentration of a certain acid were made on each leaf. The data are presented in the
following table (NKNW, page 1160).
Plant
1
2
3
4
Leaf
1
2
3
1
2
3
1
2
3
1
2
3
1
11.2
16.5
18.3
14.1
19.0
11.9
15.3
19.5
16.5
7.3
8.9
11.3
Determination
2
11.6
16.8
18.7
13.8
18.5
12.4
15.9
20.1
17.2
7.8
9.4
10.9
3
12.0
16.1
19.0
14.2
18.2
12.0
16.0
19.3
16.9
7.0
9.3
10.5
These data are posted in the file pacid.dat on the course web page
Consider the model
Yijk = µ + Pi + Lij + eijk
where Yijk is the observed acid concentration for the k-th determination made on the j-th leaf
of the i-th plant and
Pi ~ NID(0, σ 2P )
Lij ~ NID(0, σ 2L )
and
eijk ~ NID(0, σ 2e )
and all random effects are independent of each other.
A. Obtain the ANOVA table.
B. Report formulas for expectations of mean squares.
C. Obtain REML estimates of the variance components, σ 2P , σ 2L , and σ 2e .
D. Construct a 95% confidence interval for µ, the mean acid concentration for the
population of plants.
E. Construct a 95% confidence interval for σ 2e .
F.
Construct a 95% confidence interval for σ 2L /σ 2e . (Hint: use an F-distribution)
3
3. The data in the following table are from an experiment where the amount of dry matter was
measured for wheat plants grown under conditions with different levels of moisture and
different amounts of fertilizer. There were 48 pots and 12 plastic trays used in the
experiment. The same soil mixture was used in each pot. Four pots were placed in each tray.
The levels of the moisture factor corresponded to adding 10, 20, 30, or 40 ml. of water per pot
per day to the tray. The water was absorbed thorough holes in the bottom of the pots.
Moisture levels were randomly assigned to trays with three trays assigned to each moisture
level.
Before planting the wheat seeds, fertilizer was added to the soil in the pots at levels of 2, 4, 6,
or 8 mg. per pot. The four levels of fertilizer were randomly assigned to the four pots within
each tray. An independent randomization was done within each tray. Then the same number
wheat seeds were planted in each pot and after 30 days the wheat plants were removed from
the pots and dried. The weight of the dry matter (in ounces) was recorded for each pot. The
observed weights are shown in the following table. (M&J, 1984)
Level of fertilizer (mg)
Moisture
Level
(mll /pot/day)
10
20
30
40
Tray
2
4
6
8
1
2
3
4
5
6
7
8
9
10
11
12
3.3458
4.0444
1.9758
5.0490
5.9131
6.9511
6.5693
8.2974
5.2785
6.8393
6.4997
4.0482
4.3170
4.1413
3.8397
7.9419
8.5129
7.0265
10.7348
8.9081
8.6654
9.0842
6.0702
3.8376
4.5572
6.5173
4.4730
10.7697
10.3934
10.9334
12.2626
13.4373
11.1372
10.3654
10.7486
9.4367
5.8794
7.3776
5.1180
13.5168
13.9157
15.2750
15.7133
14.9575
15.6332
12.5144
12.5034
10.2811
These data have been posted on the course web page as wheatw.dat.
A. Identify the following features of this experiment, if they exist.
primary experimental units:
sub-plot units:
treatment factors:
blocking factors:
4
B. Consider the model
Yijk = µ + αi + γij + τk + δik + eijk
where Yijk is the observed dry matter weight for the wheat grown in the pot assigned to
the k-th level of fertilizer in the j-th tray assigned to the i-th level of the moisture factor.
Here γij and eijk are random terms with
eijk ~ NID(0, σ 2e )
and
γij ~ NID(0, σ γ2 )
and any eijk is independent of any γij.
Report an ANOVA table for this model and give formulas for the expectation of the
mean squares.
C. Use the mean squares from the ANOVA table in Part B to obtain estimates of the
variance components σ 2e and σ γ2 .
D. With respect to the model in part B, which of the following are estimable quantities?
µ +τ1,
µ + α1 + τ1 + δ11 ,
τ1 - τ2,
α1 + δ11 - α2 - δ21,
δ11 - δ13 - δ21 + δ23 ,
4
4

 

α + 1 ∑ δ  − α + 1 ∑ δ 
1
1
k
2
2
k

 

4 k =1
4 k=1

 

For any quantity that is estimable, use the value of the b.l.u.e. and the standard
error of the b.l.u.e. to construct a 95% confidence interval.
F.
Construct a profile plot of the sample means for the various combinations of the
moisture and fertilizer factors with moisture level on the horizontal axis and one profile
for each fertilizer level. Construct a corresponding plot with fertilizer levels on the
horizontal axis and one profile for each moisture level. What do these plots suggest?
G. The first plot in Part F suggests that within each moisture level there is a straight line
trend in the dry matter weight means across fertilizer levels. Use µik = µ + αi + τk + δik
to denote the mean dry matter weight for the i-th moisture level and the j-th fertilizer
level. Then orthogonal contrasts for linear and quadratic trends across fertilizer levels
within the i-th moisture level are
cl,i = −3µi1 − µi2 + µi3 + 3µi4
cq,i = −µi1 + µi2 + µi3 − µi4
Estimates of these contrasts are
5
cˆ l, i = − 3Yi⋅1 − Yi⋅2 + Yi⋅3 + 3Yi⋅4
cˆ q , i = − Yi⋅1 + Yi⋅2 + Yi⋅3 − Yi⋅4
Present formulas for the variances of cˆ l, i and cˆ q ,i .
H. Make a table of the values of cˆ l, i and cˆ q ,i for all four moisture levels. Include the
value of the standard error of each contrast estimate, the value of the t-test of the null
hypothesis that the contrast is zero, and the degrees of freedom and p-value for each ttest. State your conclusions.
I.
Repeat Part G for linear and quadratic contrasts across moisture levels with a particular
fertilizer level. The estimated contrasts are
dˆ l, k = − 3Y1⋅k − Y2⋅k + Y3⋅k + 3Y4⋅k
dˆ q , k = − Y1⋅k + Y2⋅k + Y2⋅k − Y4⋅k
Report formulas for the variance of dˆ l, k and dˆ q , k .
J.
Repeat Part J for the contrasts defined in Part J. State your conclusions.
K. Given the results from the previous parts of this problem, a reasonable simplification of
the model in part B appears to be
Yijk = β0 + β1 (X1i − X1. ) + β2 (X1i − X1. )
2
+ β3 (X 2k − X 2. ) + β4 (X1i − X1. ) (X 2k − X 2. )
2
+ β5 (X1i − X1i ) (X 2k − X 2. ) + γij + εijk .
2
where
X1i denotes the value of the i-th moisture level.
X2k denotes the value of the k-th fertilizer level
and
γij ~ NID(0, σ γ2 ) is independent of eijk ~ NID(0, σ 2e ).
6
Report REML estimates for σ 2e and σ γ2 and report a table of estimates, standard errors,
and tests of significance for β0, β1, β2, β3, β4, β5.
L.
Is the model in Part B a significant improvement over the model in Part K? Give a value
for your test statistic and state your conclusion.
4. In an experiment conducted at the Microelectronics Division of Lucent Technologies to study
variability in the manufacturing of certain microcircuits, the intensities of the electrical
current were measured at five voltage levels on each of eight sites of each of ten wafers.
Assume that the ten wafers were randomly selected from a very large population of wafers.
The data are posted in the file wafer.dat on the course web page. The are four hundred lines
on this file with four values on each line corresponding to four variables in the following
order:
♦
♦
♦
♦
Wafer:
Site:
Voltage:
Current:
An identification number for each of the ten wafers.
An identification number for each of eight sites within a wafer.
The voltage level (Volts)
The measured intensity of the current (mA)
The objective is to model the mean intensity of the current as a function of voltage and to
identify an appropriate covariance structure for the random effects. Report a formula for your
model, estimates of the parameters for fixed effects and their standard errors. Report REML
estimates of variance components and correlations among random effects. Write a one
paragraph summary of the conclusions from your analysis.
Download