Study Guide 4

advertisement
Statistics 512
Study Guide 4
Spring 2002
1. One-way random effects model
A soil scientist conducted an experiment to investigate calcium content (mg/kg) in
tropical soils. Since there is a very large number of soil types in the study region of
interest, a random sample of eleven soil types was selected. For each soil type, four
locations were randomly selected where composite soil samples were collected and
analyzed in a lab for calcium content.
Answer the following questions.
a. Identify the design used in this experiment. Is factor soil fixed or random? Justify your
answer.
b. Give an appropriate model for describing soil calcium content. Explain each term in
the model.
c. What assumptions are required for this model?
d. Assuming ANOVA assumptions are satisfied, conduct an analysis of variance and test
the significance of the soil effect.
e. What are the estimates of the variance components in the model defined in (b)?
f. Calculate a 95%CI for the mean soil calcium content.
Table 1. PROC GLM. General Linear Models Procedure
Class Level Information
Class
Levels
Values
SOIL
11
1 2 3 4 5 6 7 8 9 10 11
Number of observations in data set = 44
Dependent Variable: CALCIUM
Sum of
Source
DF
Squares
Model
10
31248.681818
Error
33
6929.750000
Corrected Total
43
38178.431818
Mean
Square
3124.868182
209.992424
F Value
14.88
Pr > F
0.0001
R-Square
0.818490
C.V.
9.342258
Root MSE
14.491115
CALCIUM Mean
155.11364
Source
SOIL
DF
10
Type I SS
31248.681818
Mean Square
3124.868182
F Value
14.88
Pr > F
0.0001
Source
SOIL
DF
10
Type III SS
31248.681818
Mean Square
3124.868182
F Value
14.88
Pr > F
0.0001
1
Statistics 512
Study Guide 4
Spring 2002
General Linear Models Procedure
Source
Type III Expected Mean Square
SOIL
Var(Error) + 4 Var(SOIL)
Table 2. The MIXED Procedure
Class
SOIL
Class Level Information
Levels Values
11 1 2 3 4 5 6 7 8 9 10 11
REML Estimation Iteration History
Iteration Evaluations
0
1
1
1
Convergence criteria met.
Objective
338.70370365
303.70902144
Criterion
0.00000000
Covariance Parameter Estimates (REML)
Cov Parm
Estimate
SOIL
728.71893939
Residual
209.99242424
Model Fitting Information for CALCIUM
Description
Value
Observations
44.0000
Res Log Likelihood
-191.369
Akaike's Information Criterion -193.369
Schwarz's Bayesian Criterion
-195.130
-2 Res Log Likelihood
382.7377
ESTIMATE Statement Results
Parameter
Estimate
Std Error
DF
t
OVERALL MEAN
155.11363636
8.42732054
10
18.41
Pr > |t|
0.0001
2
Statistics 512
Study Guide 4
Spring 2002
2. Mixed effects model.
As a part of new car models performance tests, an experiment was conducted to
investigate the effect of new models of cars on mileage per gallon. The same grade of
gasoline was used in this study. There were five models in this experiment including a
control (current model) and four new models (A, B, C, and D). The same driver was used
in this experiment and he was given enough time to rest after each trial. Also, on each
day, the order in which the five models were used was randomized.
Each randomly selected car model was driven for the same course in a testing ground in
Nevada. The entire experiment was conducted on 10 separate days. So, there were overall
50 new cars used in this experiment, i.e. 10 identical cars per model. The response
variable of interest was mileage per gallon.
Answer the following questions.
a. What are the factors in this experiment? Are they fixed or random. Give a brief
justification of your answer in one sentence for each factor.
b. What type of design is used in this experiment? What is the experimental unit?
c. Write a linear model for this experiment and give the appropriate ANOVA
assumptions.
d. Should day be treated as a fixed or a random factor? Give a brief justification of your
answer in one sentence.
e. Estimate the variance components for this model.
f. Determine if there are significant day or car model effects.
g. Is there a significant difference between the control and the mean of the 4 new
models?
h. Calculate the standard error of the mean mileage per gallon for any car's model in this
study.
i. Calculate the standard error of the difference between the mean mileage per gallon of
two different car models in this study.
3
Statistics 512
Study Guide 4
Spring 2002
Table 3. The MIXED Procedure
Class
CARMODEL
DAY
Class Level Information
Levels Values
5 A B C D F
10 1 2 3 4 5 6 7 8 9 10
REML Estimation Iteration History
Iteration Evaluations
Objective
Criterion
0
1
34.37111572
1
1
30.49595184
0.00000000
Convergence criteria met.
Covariance Parameter Estimates (REML)
Cov Parm
Estimate
DAY
0.14782222
Residual
0.46355556
Model Fitting Information for MILEAGE
Description
Value
Observations
50.0000
Res Log Likelihood
-56.6002
Akaike's Information Criterion -58.6002
Schwarz's Bayesian Criterion
-60.4069
-2 Res Log Likelihood
113.2004
Tests of Fixed Effects
Source
CARMODEL
NDF
4
Coefficients for CONTROL & OTHERS
Effect
INTERCEPT
CARMODEL
CARMODEL
CARMODEL
CARMODEL
CARMODEL
Parameter
CONTROL & OTHERS
Effect
CARMODEL
CARMODEL
CARMODEL
CARMODEL
CARMODEL
CARMODEL
A
B
C
D
F
DDF
36
Type III F
19.14
CARMODEL
A
B
C
D
F
Row 1
0
0.25
0.25
0.25
0.25
-1
ESTIMATE Statement Results
Estimate
Std Error
-1.46250000
0.24071652
Least Squares Means
LSMEAN
Std Error
19.75000000
0.24726055
20.23000000
0.24726055
21.27000000
0.24726055
19.50000000
0.24726055
21.65000000
0.24726055
Pr > F
0.0001
DF
36
DF
36
36
36
36
36
t
-6.08
t
79.88
81.82
86.02
78.86
87.56
Pr > |t|
0.0001
Pr > |t|
0.0001
0.0001
0.0001
0.0001
0.0001
4
Statistics 512
Study Guide 4
Spring 2002
Table 4. General Linear Models Procedure
Class Level Information
Class
Levels
CARMODEL
5
DAY
10
Values
A B C D F
1 2 3 4 5 6 7 8 9 10
Number of observations in data set = 50
TESTING THE SIGNIFICANCE OF THE RANDOM FACTOR.
General Linear Models Procedure
Dependent Variable: MILEAGE
Source
Model
Error
Corrected Total
Source
DAY
CARMODEL
CARMODEL*DAY
DF
49
0
49
Sum of
Squares
63.00000000
.
63.00000000
Mean
Square
1.28571429
.
R-Square
C.V.
Root MSE
MILEAGE Mean
1.000000
0
0
20.480000
DF
9
4
36
Type I SS
10.82400000
35.48800000
16.68800000
Mean Square
1.20266667
8.87200000
0.46355556
F Value
.
.
.
Pr > F
.
.
.
Type III SS
10.82400000
35.48800000
16.68800000
RANDOM FACTOR.
Mean Square
1.20266667
8.87200000
0.46355556
F Value
.
.
.
Pr > F
.
.
.
Source
DF
DAY
9
CARMODEL
4
CARMODEL*DAY
36
TESTING THE SIGNIFICANCE OF THE
F Value
.
Pr > F
.
General Linear Models Procedure
Source
DAY
CARMODEL
CARMODEL*DAY
Type III Expected Mean Square
Var(Error) + Var(CARMODEL*DAY) + 5 Var(DAY)
Var(Error) + Var(CARMODEL*DAY) + Q(CARMODEL)
Var(Error) + Var(CARMODEL*DAY)
5
Statistics 512
Study Guide 4
Spring 2002
Question 3. Two-way ANOVA random effects model.
A group of food industry scientists conducted a study to investigate the effect of type of
packing material and brand of margarine on sale volumes. Due to an existing large
number of brands of chewing gum, a random sample of three brands was selected and
used in this study. There was also a large collection of types of packages available and a
random sample of five types of packages was used in this study. All treatment
combinations between levels of the two factors were sent to 90 stores owned by the food
industry. There were three stores per treatment combination in this experiment. After
three months, total sale volumes (thousands of US dollars) were recorded per brand per
package.
Answer the following questions.
a. What factors are used in this experiment? Are they random or fixed? Justify your
answer.
b. Write a linear model for this experiment.
c. Estimate the variance components for any random factors in your model.
d. Test the significance of the effects in the model defined in (b).
e. What is the covariance between two replicates with the same package and the same
brand?
f. What is the covariance between two replicates with different packages but with the
same brand?
g. What is the variance of a randomly selected observation?
h. Calculate a 95%CI for the overall mean sale volume.
Table 5. General Linear Models Procedure
Class Level Information
Class
PACKAGE
BRAND
Levels
5
6
Values
1 2 3 4 5
1 2 3 4 5 6
Number of observations in data set = 90
Dependent Variable: SALE
Source
Model
Error
Corrected Total
DF
29
60
89
Sum of
Squares
189483.43333
119368.66667
308852.10000
Mean
Square
6533.91149
1989.47778
F Value
3.28
Pr > F
0.0001
Source
PACKAGE
BRAND
DF
4
5
Type I SS
35311.48889
142554.50000
Mean Square
8827.87222
28510.90000
F Value
4.44
14.33
Pr > F
0.0033
0.0001
6
Statistics 512
Study Guide 4
Spring 2002
PACKAGE*BRAND
20
11617.44444
580.87222
0.29
0.9983
Source
PACKAGE
BRAND
PACKAGE*BRAND
DF
4
5
20
Type III SS
35311.48889
142554.50000
11617.44444
Mean Square
8827.87222
28510.90000
580.87222
F Value
4.44
14.33
0.29
Pr > F
0.0033
0.0001
0.9983
Source
PACKAGE
BRAND
PACKAGE*BRAND
Type III Expected Mean Square
Var(Error) + 3 Var(PACKAGE*BRAND) + 18 Var(PACKAGE)
Var(Error) + 3 Var(PACKAGE*BRAND) + 15 Var(BRAND)
Var(Error) + 3 Var(PACKAGE*BRAND)
General Linear Models Procedure
Dependent Variable: SALE
Tests of Hypotheses using the Type III MS for PACKAGE*BRAND as an error term
Source
PACKAGE
DF
4
Type III SS
35311.488889
Mean Square
8827.872222
F Value
15.20
Pr > F
0.0001
Tests of Hypotheses using the Type III MS for PACKAGE*BRAND as an error term
Source
BRAND
DF
5
Type III SS
142554.50000
Mean Square
28510.90000
F Value
49.08
Pr > F
0.0001
Table 6. The MIXED Procedure
Class Level Information
Class
Levels
PACKAGE
BRAND
5
6
Values
1 2 3 4 5
1 2 3 4 5 6
REML Estimation Iteration History
Iteration
0
1
2
3
Evaluations
1
2
1
1
Objective
819.02615701
773.20075100
773.19829908
773.19829078
Criterion
0.00000611
0.00000002
0.00000000
Convergence criteria met.
Covariance Parameter Estimates (REML)
Cov Parm
PACKAGE
BRAND
PACKAGE*BRAND
Residual
Estimate
399.47482616
1791.5593069
0.00000000
1637.3270315
Model Fitting Information for SALE
Description
Observations
Res Log Likelihood
Akaike's Information Criterion
Schwarz's Bayesian Criterion
-2 Res Log Likelihood
Value
90.0000
-468.385
-472.385
-477.362
936.7693
ESTIMATE Statement Results
7
Statistics 512
Parameter
MEAN
Study Guide 4
Estimate
559.10000000
Std Error
19.91684477
DF
4
Spring 2002
t
28.07
Pr > |t|
0.0001
8
Statistics 512
Study Guide 4
Biometry & Statistics 602
Solutions, Study Guide #12
Spring 2002
Spring 1999
1. One-way random effects model
a. The design used in this experiment is a completely randomized design with a random
factor. Factor soil is random because it consists of soil types randomly selected from a
larger population of soils.
b. An appropriate model for describing soil calcium content is as follows:
Yij =  + i + ij
Yij = calcium content of the jth sample in the ith soil type; i = 1, 2,…, 11; and = 1, 2, …,4
overall mean soil calcium content in the population
i = effect of the ith soil type; i = 1, 2,…, 11
ij = random effect corresponding to the jth sample in the ith soil type
c. The assumptions for the linear model in (b) are as follows:
i and ij are independent.
i are assumed identically and independently normally distributed with mean 0 and
variance 2.
ij are assumed identically and independently normally distributed with mean 0 and
variance 2.
d. We want to test H0: 2 = 0 versus HA: 2 > 0.
Since the design is balanced results from PROC GLM, are used. Results in Table 1 for
this test give an F* = 14.88 and p-value = 0.0001. We reject H0: 2 = 0 in favor of HA:
2 > 0 and we conclude that the effect of type of soil on calcium content is highly
significant.
e. The estimates of the variance components are easily obtained using PROC MIXED.
Results are in Table 2 and lead to the following results:
The estimate of  2 , the soil type variance component is  2  728.719 .
The estimate of  2 , the error variance component is  2  209.992 .
f. The 95% confidence interval for the overall mean soil calcium content is based on
results in Table 2.
ch
Note that SE Y.. 
bg
MS soil
3124.867

 8.427
N
44
ch
b
136.34,17389
. g
Y..  t (0.975,10) SE Y..  155114
.
 2.228 * 8.427  155114
.
 18.776
The 95%CI for the mean maximum intensity is between 136.34mg/kg of soil and
173.89mg/kg of soil.
9
Statistics 512
Study Guide 4
Spring 2002
2. Mixed effects model
a. There are two factors in this experiment: factor car model and factor day. Factor car
model is fixed because interest is solely about the five car models used in this
experiment, including the control. Day is a random factor because we want the results of
this experiment to be valid for any day not just the ones in this experiment.
b. The design used is a randomized complete block design. Car is the experimental unit
because mileage per gallon is measured on a car.
c. A linear model for this experiment is as follows:
Yijk =  + i + j + ()ij + ijk
Yijk = mileage per gallon for the ith car model on the jth day
 = the population mean mileage
i = the effect of the ith car model; i = A, B, C, D, and F
j = the effect of the jth day; j = 1, 2, …, 10
()ij = interaction between the ith car model and the jth day
ijk = effect of the random error corresponding to the kth car of the ith car model, on the jth
day
d. As mentioned in (a), day should be treated as a random factor. Justification is given in
(a).
e. Estimation of the variance components for the model is readily done in PROC MIXED.
The estimate of the variance component for factor day,  2 = 0.1478
2
The estimate of the variance component for the interaction is,  
= 0.4636
The estimate of the variance component for the random error, ijk, cannot be estimated
because we have nij = 1. In other words, treatment combinations are not replicated.
f. Based on results in Table 4, the test statistic of H0: i = 0 for testing the car model
effect gives:
MS car mod el
8.87200000
F* 

 19.14 .
MS int eraction
0.46355556
b
b
g
g
Because F* = 19.14 > F(0.95;4,36) = 2.6335, we reject H0: i = 0 and we conclude that
the car model effect on mileage per gallon is statistically significant.
The test statistic of H0: 2 = 0 versus HA: 2 > 0 for testing the day effect gives:
MS day
1.20266667
F* 

 2.59
MS int eraction
0.46355556
b
bg
g
Because F* = 2.59 > F(0.95;4,36) = 2.1526, we reject H0: 2 = 0 and we conclude that
the day effect on mileage per gallon is statistically significant.
10
Statistics 512
Study Guide 4
Spring 2002
g. Results in Table 3 indicate that there a significant difference between the control and
the average of the four other car models (p-value = 0.0001).
h. The standard error of the mean mileage per gallon for car model A is:
d   i 
SE c
Y h

# blocks
2

2
0.46355556  0147822
.
 0.24726
10

A.
i. The standard error of the difference in mean mileage per gallon between two different
d   i 
Y  Y i
car models i and j is SE d
# blocks
2

i.
2

j.
2 * 0.46355556
 0.304485
10
3. Two-way ANOVA random effects model.
a. Factor type of packaging material and brand of margarine are used in this experiment.
Both factors are random as each one consists of randomly selected levels from a larger
population of levels.
b. A linear model for this experiment is given below.
Yijk =  + i + j + ()ij + ijk
 = the overall population mean
i = the effect of the ith type of package
j = the effect of the jth brand of margarine
()ij = interaction between the ith type of package and the jth brand of margarine
ijk = the experimental error
The assumptions are as follows:
i, j, ()ij, and, ijk are independent.
Also i are identically and independently N(0,2)
j are identically and independently N(0,2)
ij are identically and independently N(0,2)
ijk are identically and independently N(0,2)
c. Estimates of the variance components are:
Estimates the variance components for any random factors are in the table below.
Covariance Parameter Estimates (REML)
Cov Parm
PACKAGE
BRAND
PACKAGE*BRAND
Residual
Estimate
399.47482616
1791.5593069
0.00000000
1637.3270315
The estimate of the variance component for package,  2 = 399.47482616
The estimate of the variance component for brand,  2 = 1791.5593069
2
The estimate of the variance component for the interaction package*brand,  
= 0.00
11
Statistics 512
Study Guide 4
Spring 2002
The estimate of the variance component for the experimental error,  2 = 1637.3270315
d. Testing the significance of the effects in the model.
MSAB 580.87222

 0.29
Test of the interaction effect is F * 
MSE 1989.47778
MS brand
28510.9
Test of the effect of factor brand is F * 

 49.08
MSAB
580.87222
MS package 8827.87222
Test of the effect of factor package is F * 

 15.2
MSAB
580.87222
The interaction is not statistically significant (p-value = 0.9983).
The effect of brand is highly significant (p-value = 0.0001).
The effect of package is highly significant (p-value = 0.0033).
b g
b g
e. The covariance between the replicates with the same package and brand
= 2 + 2 + 2
f. The covariance between samples with different packages but the same brand = 2.
g. The variance of a randomly selected observation = 2 + 2 + 2 + 2
h. The 95%CI for the overall population mean is
We are 95% confident that the population mean sale volume is between $503810 and
$614390.
12
Download