STAT 511 Assignment 5 Name ___________________ Spring 2001 Reading Assignment: Rencher, Chapter 15 Engel, The Analysis of Unbalanced Linear Models with Variance Components (posted in the course web page) Neter, Kutner, Nachtsheim & Wasserman, Applied Linear Statistical Models, 4th Edition, Chapters 13, 24, 28, 29. Written Assignment: Due Friday, April 6, in class. 1. Analyze the data from Problem 3 on Assignment 4, assuming that the block effects corresponding to the sources of flour are random effects. Assume the model Yijk = µ + αi + βj + γij + τk + eijk where Yijk is the observed bread volume of 10 loaves of bread made with the i-th fat and the j-th surfactant with flour from the k-th batch of flour. Here, we will consider the batches of flour as samples from the possible batches of flour that could be delivered to this bakery, and assume that τk ~ NID(0, σ τ2 ) and eijk ~ NID(0, σ 2e ) are independent sets of random effects. A. Obtain REML estimates of the variance components σ τ2 and σ 2e . B. Use the Type III sums of squares provided by PROC GLM in SAS to test the null hypothesis of no interaction between the fat and surfactant factors. Show the value of your test statistic, its degrees of freedom and the resulting p-value. State your conclusion. C. Use the Type III sums of squares results from the PROC MIXED output to test the null hypothesis of no interaction between the fat and surfactant factors. D. Because of the imbalance caused by missing data the test statistics are not the same in parts B and C. Which is the more appropriate test to use? Explain. E. Use the "slice" option and the LSMEANS option in PROC MIXED to examine differences in means volumes for the three fats for each surfactant. Does the same fat produce the highest mean volume for each surfactant? Consider random variation in the estimators for the means in answering this question. F. Predict the values of the random effects of the four sources of flour selected for this experiment. Which sources of flour are superior with respect to providing the largest mean bread volumes? Justify your answer. 2 2. Four plants of the same variety were randomly samples from a large field of plants. Three leaves were randomly selected from each plant and three separate determinations of the concentration of a certain acid were made on each leaf. The data are presented in the following table (NKNW, page 1160). Plant 1 2 3 4 Leaf 1 2 3 1 2 3 1 2 3 1 2 3 1 11.2 16.5 18.3 14.1 19.0 11.9 15.3 19.5 16.5 7.3 8.9 11.3 Determination 2 11.6 16.8 18.7 13.8 18.5 12.4 15.9 20.1 17.2 7.8 9.4 10.9 3 12.0 16.1 19.0 14.2 18.2 12.0 16.0 19.3 16.9 7.0 9.3 10.5 These data are posted in the file pacid.dat on the course web page Consider the model Yijk = µ + Pi + Lij + eijk where Yijk is the observed acid concentration for the k-th determination made on the j-th leaf of the i-th plant and Pi ~ NID(0, σ 2P ) Lij ~ NID(0, σ 2L ) and eijk ~ NID(0, σ 2e ) and all random effects are independent of each other. A. Obtain the ANOVA table. B. Report formulas for expectations of mean squares. C. Obtain REML estimates of the variance components, σ 2P , σ 2L , and σ 2e . D. Construct a 95% confidence interval for µ, the mean acid concentration for the population of plants. E. Construct a 95% confidence interval for σ 2e . F. Construct a 95% confidence interval for σ 2L /σ 2e . (Hint: use an F-distribution) 3 3. The data in the following table are from an experiment where the amount of dry matter was measured for wheat plants grown under conditions with different levels of moisture and different amounts of fertilizer. There were 48 pots and 12 plastic trays used in the experiment. The same soil mixture was used in each pot. Four pots were placed in each tray. The levels of the moisture factor corresponded to adding 10, 20, 30, or 40 ml. of water per pot per day to the tray. The water was absorbed thorough holes in the bottom of the pots. Moisture levels were randomly assigned to trays with three trays assigned to each moisture level. Before planting the wheat seeds, fertilizer was added to the soil in the pots at levels of 2, 4, 6, or 8 mg. per pot. The four levels of fertilizer were randomly assigned to the four pots within each tray. An independent randomization was done within each tray. Then the same number wheat seeds were planted in each pot and after 30 days the wheat plants were removed from the pots and dried. The weight of the dry matter (in ounces) was recorded for each pot. The observed weights are shown in the following table. (M&J, 1984) Level of fertilizer (mg) Moisture Level (mll /pot/day) 10 20 30 40 Tray 2 4 6 8 1 2 3 4 5 6 7 8 9 10 11 12 3.3458 4.0444 1.9758 5.0490 5.9131 6.9511 6.5693 8.2974 5.2785 6.8393 6.4997 4.0482 4.3170 4.1413 3.8397 7.9419 8.5129 7.0265 10.7348 8.9081 8.6654 9.0842 6.0702 3.8376 4.5572 6.5173 4.4730 10.7697 10.3934 10.9334 12.2626 13.4373 11.1372 10.3654 10.7486 9.4367 5.8794 7.3776 5.1180 13.5168 13.9157 15.2750 15.7133 14.9575 15.6332 12.5144 12.5034 10.2811 These data have been posted on the course web page as wheatw.dat. A. Identify the following features of this experiment, if they exist. primary experimental units: sub-plot units: treatment factors: blocking factors: 4 B. Consider the model Yijk = µ + αi + γij + τk + δik + eijk where Yijk is the observed dry matter weight for the wheat grown in the pot assigned to the k-th level of fertilizer in the j-th tray assigned to the i-th level of the moisture factor. Here γij and eijk are random terms with eijk ~ NID(0, σ 2e ) and γij ~ NID(0, σ γ2 ) and any eijk is independent of any γij. Report an ANOVA table for this model and give formulas for the expectation of the mean squares. C. Use the mean squares from the ANOVA table in Part B to obtain estimates of the variance components σ 2e and σ γ2 . D. With respect to the model in part B, which of the following are estimable quantities? µ +τ1, µ + α1 + τ1 + δ11 , τ1 - τ2, α1 + δ11 - α2 - δ21, δ11 - δ13 - δ21 + δ23 , 4 4 α + 1 ∑ δ − α + 1 ∑ δ 1 1 k 2 2 k 4 k =1 4 k=1 For any quantity that is estimable, use the value of the b.l.u.e. and the standard error of the b.l.u.e. to construct a 95% confidence interval. F. Construct a profile plot of the sample means for the various combinations of the moisture and fertilizer factors with moisture level on the horizontal axis and one profile for each fertilizer level. Construct a corresponding plot with fertilizer levels on the horizontal axis and one profile for each moisture level. What do these plots suggest? G. The first plot in Part F suggests that within each moisture level there is a straight line trend in the dry matter weight means across fertilizer levels. Use µik = µ + αi + τk + δik to denote the mean dry matter weight for the i-th moisture level and the j-th fertilizer level. Then orthogonal contrasts for linear and quadratic trends across fertilizer levels within the i-th moisture level are cl,i = −3µi1 − µi2 + µi3 + 3µi4 cq,i = −µi1 + µi2 + µi3 − µi4 Estimates of these contrasts are 5 cˆ l, i = − 3Yi⋅1 − Yi⋅2 + Yi⋅3 + 3Yi⋅4 cˆ q , i = − Yi⋅1 + Yi⋅2 + Yi⋅3 − Yi⋅4 Present formulas for the variances of cˆ l, i and cˆ q ,i . H. Make a table of the values of cˆ l, i and cˆ q ,i for all four moisture levels. Include the value of the standard error of each contrast estimate, the value of the t-test of the null hypothesis that the contrast is zero, and the degrees of freedom and p-value for each ttest. State your conclusions. I. Repeat Part G for linear and quadratic contrasts across moisture levels with a particular fertilizer level. The estimated contrasts are dˆ l, k = − 3Y1⋅k − Y2⋅k + Y3⋅k + 3Y4⋅k dˆ q , k = − Y1⋅k + Y2⋅k + Y2⋅k − Y4⋅k Report formulas for the variance of dˆ l, k and dˆ q , k . J. Repeat Part J for the contrasts defined in Part J. State your conclusions. K. Given the results from the previous parts of this problem, a reasonable simplification of the model in part B appears to be Yijk = β0 + β1 (X1i − X1. ) + β2 (X1i − X1. ) 2 + β3 (X 2k − X 2. ) + β4 (X1i − X1. ) (X 2k − X 2. ) 2 + β5 (X1i − X1i ) (X 2k − X 2. ) + γij + εijk . 2 where X1i denotes the value of the i-th moisture level. X2k denotes the value of the k-th fertilizer level and γij ~ NID(0, σ γ2 ) is independent of eijk ~ NID(0, σ 2e ). 6 Report REML estimates for σ 2e and σ γ2 and report a table of estimates, standard errors, and tests of significance for β0, β1, β2, β3, β4, β5. L. Is the model in Part B a significant improvement over the model in Part K? Give a value for your test statistic and state your conclusion. 4. In an experiment conducted at the Microelectronics Division of Lucent Technologies to study variability in the manufacturing of certain microcircuits, the intensities of the electrical current were measured at five voltage levels on each of eight sites of each of ten wafers. Assume that the ten wafers were randomly selected from a very large population of wafers. The data are posted in the file wafer.dat on the course web page. The are four hundred lines on this file with four values on each line corresponding to four variables in the following order: ♦ ♦ ♦ ♦ Wafer: Site: Voltage: Current: An identification number for each of the ten wafers. An identification number for each of eight sites within a wafer. The voltage level (Volts) The measured intensity of the current (mA) The objective is to model the mean intensity of the current as a function of voltage and to identify an appropriate covariance structure for the random effects. Report a formula for your model, estimates of the parameters for fixed effects and their standard errors. Report REML estimates of variance components and correlations among random effects. Write a one paragraph summary of the conclusions from your analysis.