Statistics 512 Study Guide 4 Spring 2002 1. One-way random effects model A soil scientist conducted an experiment to investigate calcium content (mg/kg) in tropical soils. Since there is a very large number of soil types in the study region of interest, a random sample of eleven soil types was selected. For each soil type, four locations were randomly selected where composite soil samples were collected and analyzed in a lab for calcium content. Answer the following questions. a. Identify the design used in this experiment. Is factor soil fixed or random? Justify your answer. b. Give an appropriate model for describing soil calcium content. Explain each term in the model. c. What assumptions are required for this model? d. Assuming ANOVA assumptions are satisfied, conduct an analysis of variance and test the significance of the soil effect. e. What are the estimates of the variance components in the model defined in (b)? f. Calculate a 95%CI for the mean soil calcium content. Table 1. PROC GLM. General Linear Models Procedure Class Level Information Class Levels Values SOIL 11 1 2 3 4 5 6 7 8 9 10 11 Number of observations in data set = 44 Dependent Variable: CALCIUM Sum of Source DF Squares Model 10 31248.681818 Error 33 6929.750000 Corrected Total 43 38178.431818 Mean Square 3124.868182 209.992424 F Value 14.88 Pr > F 0.0001 R-Square 0.818490 C.V. 9.342258 Root MSE 14.491115 CALCIUM Mean 155.11364 Source SOIL DF 10 Type I SS 31248.681818 Mean Square 3124.868182 F Value 14.88 Pr > F 0.0001 Source SOIL DF 10 Type III SS 31248.681818 Mean Square 3124.868182 F Value 14.88 Pr > F 0.0001 1 Statistics 512 Study Guide 4 Spring 2002 General Linear Models Procedure Source Type III Expected Mean Square SOIL Var(Error) + 4 Var(SOIL) Table 2. The MIXED Procedure Class SOIL Class Level Information Levels Values 11 1 2 3 4 5 6 7 8 9 10 11 REML Estimation Iteration History Iteration Evaluations 0 1 1 1 Convergence criteria met. Objective 338.70370365 303.70902144 Criterion 0.00000000 Covariance Parameter Estimates (REML) Cov Parm Estimate SOIL 728.71893939 Residual 209.99242424 Model Fitting Information for CALCIUM Description Value Observations 44.0000 Res Log Likelihood -191.369 Akaike's Information Criterion -193.369 Schwarz's Bayesian Criterion -195.130 -2 Res Log Likelihood 382.7377 ESTIMATE Statement Results Parameter Estimate Std Error DF t OVERALL MEAN 155.11363636 8.42732054 10 18.41 Pr > |t| 0.0001 2 Statistics 512 Study Guide 4 Spring 2002 2. Mixed effects model. As a part of new car models performance tests, an experiment was conducted to investigate the effect of new models of cars on mileage per gallon. The same grade of gasoline was used in this study. There were five models in this experiment including a control (current model) and four new models (A, B, C, and D). The same driver was used in this experiment and he was given enough time to rest after each trial. Also, on each day, the order in which the five models were used was randomized. Each randomly selected car model was driven for the same course in a testing ground in Nevada. The entire experiment was conducted on 10 separate days. So, there were overall 50 new cars used in this experiment, i.e. 10 identical cars per model. The response variable of interest was mileage per gallon. Answer the following questions. a. What are the factors in this experiment? Are they fixed or random. Give a brief justification of your answer in one sentence for each factor. b. What type of design is used in this experiment? What is the experimental unit? c. Write a linear model for this experiment and give the appropriate ANOVA assumptions. d. Should day be treated as a fixed or a random factor? Give a brief justification of your answer in one sentence. e. Estimate the variance components for this model. f. Determine if there are significant day or car model effects. g. Is there a significant difference between the control and the mean of the 4 new models? h. Calculate the standard error of the mean mileage per gallon for any car's model in this study. i. Calculate the standard error of the difference between the mean mileage per gallon of two different car models in this study. 3 Statistics 512 Study Guide 4 Spring 2002 Table 3. The MIXED Procedure Class CARMODEL DAY Class Level Information Levels Values 5 A B C D F 10 1 2 3 4 5 6 7 8 9 10 REML Estimation Iteration History Iteration Evaluations Objective Criterion 0 1 34.37111572 1 1 30.49595184 0.00000000 Convergence criteria met. Covariance Parameter Estimates (REML) Cov Parm Estimate DAY 0.14782222 Residual 0.46355556 Model Fitting Information for MILEAGE Description Value Observations 50.0000 Res Log Likelihood -56.6002 Akaike's Information Criterion -58.6002 Schwarz's Bayesian Criterion -60.4069 -2 Res Log Likelihood 113.2004 Tests of Fixed Effects Source CARMODEL NDF 4 Coefficients for CONTROL & OTHERS Effect INTERCEPT CARMODEL CARMODEL CARMODEL CARMODEL CARMODEL Parameter CONTROL & OTHERS Effect CARMODEL CARMODEL CARMODEL CARMODEL CARMODEL CARMODEL A B C D F DDF 36 Type III F 19.14 CARMODEL A B C D F Row 1 0 0.25 0.25 0.25 0.25 -1 ESTIMATE Statement Results Estimate Std Error -1.46250000 0.24071652 Least Squares Means LSMEAN Std Error 19.75000000 0.24726055 20.23000000 0.24726055 21.27000000 0.24726055 19.50000000 0.24726055 21.65000000 0.24726055 Pr > F 0.0001 DF 36 DF 36 36 36 36 36 t -6.08 t 79.88 81.82 86.02 78.86 87.56 Pr > |t| 0.0001 Pr > |t| 0.0001 0.0001 0.0001 0.0001 0.0001 4 Statistics 512 Study Guide 4 Spring 2002 Table 4. General Linear Models Procedure Class Level Information Class Levels CARMODEL 5 DAY 10 Values A B C D F 1 2 3 4 5 6 7 8 9 10 Number of observations in data set = 50 TESTING THE SIGNIFICANCE OF THE RANDOM FACTOR. General Linear Models Procedure Dependent Variable: MILEAGE Source Model Error Corrected Total Source DAY CARMODEL CARMODEL*DAY DF 49 0 49 Sum of Squares 63.00000000 . 63.00000000 Mean Square 1.28571429 . R-Square C.V. Root MSE MILEAGE Mean 1.000000 0 0 20.480000 DF 9 4 36 Type I SS 10.82400000 35.48800000 16.68800000 Mean Square 1.20266667 8.87200000 0.46355556 F Value . . . Pr > F . . . Type III SS 10.82400000 35.48800000 16.68800000 RANDOM FACTOR. Mean Square 1.20266667 8.87200000 0.46355556 F Value . . . Pr > F . . . Source DF DAY 9 CARMODEL 4 CARMODEL*DAY 36 TESTING THE SIGNIFICANCE OF THE F Value . Pr > F . General Linear Models Procedure Source DAY CARMODEL CARMODEL*DAY Type III Expected Mean Square Var(Error) + Var(CARMODEL*DAY) + 5 Var(DAY) Var(Error) + Var(CARMODEL*DAY) + Q(CARMODEL) Var(Error) + Var(CARMODEL*DAY) 5 Statistics 512 Study Guide 4 Spring 2002 Question 3. Two-way ANOVA random effects model. A group of food industry scientists conducted a study to investigate the effect of type of packing material and brand of margarine on sale volumes. Due to an existing large number of brands of chewing gum, a random sample of three brands was selected and used in this study. There was also a large collection of types of packages available and a random sample of five types of packages was used in this study. All treatment combinations between levels of the two factors were sent to 90 stores owned by the food industry. There were three stores per treatment combination in this experiment. After three months, total sale volumes (thousands of US dollars) were recorded per brand per package. Answer the following questions. a. What factors are used in this experiment? Are they random or fixed? Justify your answer. b. Write a linear model for this experiment. c. Estimate the variance components for any random factors in your model. d. Test the significance of the effects in the model defined in (b). e. What is the covariance between two replicates with the same package and the same brand? f. What is the covariance between two replicates with different packages but with the same brand? g. What is the variance of a randomly selected observation? h. Calculate a 95%CI for the overall mean sale volume. Table 5. General Linear Models Procedure Class Level Information Class PACKAGE BRAND Levels 5 6 Values 1 2 3 4 5 1 2 3 4 5 6 Number of observations in data set = 90 Dependent Variable: SALE Source Model Error Corrected Total DF 29 60 89 Sum of Squares 189483.43333 119368.66667 308852.10000 Mean Square 6533.91149 1989.47778 F Value 3.28 Pr > F 0.0001 Source PACKAGE BRAND DF 4 5 Type I SS 35311.48889 142554.50000 Mean Square 8827.87222 28510.90000 F Value 4.44 14.33 Pr > F 0.0033 0.0001 6 Statistics 512 Study Guide 4 Spring 2002 PACKAGE*BRAND 20 11617.44444 580.87222 0.29 0.9983 Source PACKAGE BRAND PACKAGE*BRAND DF 4 5 20 Type III SS 35311.48889 142554.50000 11617.44444 Mean Square 8827.87222 28510.90000 580.87222 F Value 4.44 14.33 0.29 Pr > F 0.0033 0.0001 0.9983 Source PACKAGE BRAND PACKAGE*BRAND Type III Expected Mean Square Var(Error) + 3 Var(PACKAGE*BRAND) + 18 Var(PACKAGE) Var(Error) + 3 Var(PACKAGE*BRAND) + 15 Var(BRAND) Var(Error) + 3 Var(PACKAGE*BRAND) General Linear Models Procedure Dependent Variable: SALE Tests of Hypotheses using the Type III MS for PACKAGE*BRAND as an error term Source PACKAGE DF 4 Type III SS 35311.488889 Mean Square 8827.872222 F Value 15.20 Pr > F 0.0001 Tests of Hypotheses using the Type III MS for PACKAGE*BRAND as an error term Source BRAND DF 5 Type III SS 142554.50000 Mean Square 28510.90000 F Value 49.08 Pr > F 0.0001 Table 6. The MIXED Procedure Class Level Information Class Levels PACKAGE BRAND 5 6 Values 1 2 3 4 5 1 2 3 4 5 6 REML Estimation Iteration History Iteration 0 1 2 3 Evaluations 1 2 1 1 Objective 819.02615701 773.20075100 773.19829908 773.19829078 Criterion 0.00000611 0.00000002 0.00000000 Convergence criteria met. Covariance Parameter Estimates (REML) Cov Parm PACKAGE BRAND PACKAGE*BRAND Residual Estimate 399.47482616 1791.5593069 0.00000000 1637.3270315 Model Fitting Information for SALE Description Observations Res Log Likelihood Akaike's Information Criterion Schwarz's Bayesian Criterion -2 Res Log Likelihood Value 90.0000 -468.385 -472.385 -477.362 936.7693 ESTIMATE Statement Results 7 Statistics 512 Parameter MEAN Study Guide 4 Estimate 559.10000000 Std Error 19.91684477 DF 4 Spring 2002 t 28.07 Pr > |t| 0.0001 8 Statistics 512 Study Guide 4 Biometry & Statistics 602 Solutions, Study Guide #12 Spring 2002 Spring 1999 1. One-way random effects model a. The design used in this experiment is a completely randomized design with a random factor. Factor soil is random because it consists of soil types randomly selected from a larger population of soils. b. An appropriate model for describing soil calcium content is as follows: Yij = + i + ij Yij = calcium content of the jth sample in the ith soil type; i = 1, 2,…, 11; and = 1, 2, …,4 overall mean soil calcium content in the population i = effect of the ith soil type; i = 1, 2,…, 11 ij = random effect corresponding to the jth sample in the ith soil type c. The assumptions for the linear model in (b) are as follows: i and ij are independent. i are assumed identically and independently normally distributed with mean 0 and variance 2. ij are assumed identically and independently normally distributed with mean 0 and variance 2. d. We want to test H0: 2 = 0 versus HA: 2 > 0. Since the design is balanced results from PROC GLM, are used. Results in Table 1 for this test give an F* = 14.88 and p-value = 0.0001. We reject H0: 2 = 0 in favor of HA: 2 > 0 and we conclude that the effect of type of soil on calcium content is highly significant. e. The estimates of the variance components are easily obtained using PROC MIXED. Results are in Table 2 and lead to the following results: The estimate of 2 , the soil type variance component is 2 728.719 . The estimate of 2 , the error variance component is 2 209.992 . f. The 95% confidence interval for the overall mean soil calcium content is based on results in Table 2. ch Note that SE Y.. bg MS soil 3124.867 8.427 N 44 ch b 136.34,17389 . g Y.. t (0.975,10) SE Y.. 155114 . 2.228 * 8.427 155114 . 18.776 The 95%CI for the mean maximum intensity is between 136.34mg/kg of soil and 173.89mg/kg of soil. 9 Statistics 512 Study Guide 4 Spring 2002 2. Mixed effects model a. There are two factors in this experiment: factor car model and factor day. Factor car model is fixed because interest is solely about the five car models used in this experiment, including the control. Day is a random factor because we want the results of this experiment to be valid for any day not just the ones in this experiment. b. The design used is a randomized complete block design. Car is the experimental unit because mileage per gallon is measured on a car. c. A linear model for this experiment is as follows: Yijk = + i + j + ()ij + ijk Yijk = mileage per gallon for the ith car model on the jth day = the population mean mileage i = the effect of the ith car model; i = A, B, C, D, and F j = the effect of the jth day; j = 1, 2, …, 10 ()ij = interaction between the ith car model and the jth day ijk = effect of the random error corresponding to the kth car of the ith car model, on the jth day d. As mentioned in (a), day should be treated as a random factor. Justification is given in (a). e. Estimation of the variance components for the model is readily done in PROC MIXED. The estimate of the variance component for factor day, 2 = 0.1478 2 The estimate of the variance component for the interaction is, = 0.4636 The estimate of the variance component for the random error, ijk, cannot be estimated because we have nij = 1. In other words, treatment combinations are not replicated. f. Based on results in Table 4, the test statistic of H0: i = 0 for testing the car model effect gives: MS car mod el 8.87200000 F* 19.14 . MS int eraction 0.46355556 b b g g Because F* = 19.14 > F(0.95;4,36) = 2.6335, we reject H0: i = 0 and we conclude that the car model effect on mileage per gallon is statistically significant. The test statistic of H0: 2 = 0 versus HA: 2 > 0 for testing the day effect gives: MS day 1.20266667 F* 2.59 MS int eraction 0.46355556 b bg g Because F* = 2.59 > F(0.95;4,36) = 2.1526, we reject H0: 2 = 0 and we conclude that the day effect on mileage per gallon is statistically significant. 10 Statistics 512 Study Guide 4 Spring 2002 g. Results in Table 3 indicate that there a significant difference between the control and the average of the four other car models (p-value = 0.0001). h. The standard error of the mean mileage per gallon for car model A is: d i SE c Y h # blocks 2 2 0.46355556 0147822 . 0.24726 10 A. i. The standard error of the difference in mean mileage per gallon between two different d i Y Y i car models i and j is SE d # blocks 2 i. 2 j. 2 * 0.46355556 0.304485 10 3. Two-way ANOVA random effects model. a. Factor type of packaging material and brand of margarine are used in this experiment. Both factors are random as each one consists of randomly selected levels from a larger population of levels. b. A linear model for this experiment is given below. Yijk = + i + j + ()ij + ijk = the overall population mean i = the effect of the ith type of package j = the effect of the jth brand of margarine ()ij = interaction between the ith type of package and the jth brand of margarine ijk = the experimental error The assumptions are as follows: i, j, ()ij, and, ijk are independent. Also i are identically and independently N(0,2) j are identically and independently N(0,2) ij are identically and independently N(0,2) ijk are identically and independently N(0,2) c. Estimates of the variance components are: Estimates the variance components for any random factors are in the table below. Covariance Parameter Estimates (REML) Cov Parm PACKAGE BRAND PACKAGE*BRAND Residual Estimate 399.47482616 1791.5593069 0.00000000 1637.3270315 The estimate of the variance component for package, 2 = 399.47482616 The estimate of the variance component for brand, 2 = 1791.5593069 2 The estimate of the variance component for the interaction package*brand, = 0.00 11 Statistics 512 Study Guide 4 Spring 2002 The estimate of the variance component for the experimental error, 2 = 1637.3270315 d. Testing the significance of the effects in the model. MSAB 580.87222 0.29 Test of the interaction effect is F * MSE 1989.47778 MS brand 28510.9 Test of the effect of factor brand is F * 49.08 MSAB 580.87222 MS package 8827.87222 Test of the effect of factor package is F * 15.2 MSAB 580.87222 The interaction is not statistically significant (p-value = 0.9983). The effect of brand is highly significant (p-value = 0.0001). The effect of package is highly significant (p-value = 0.0033). b g b g e. The covariance between the replicates with the same package and brand = 2 + 2 + 2 f. The covariance between samples with different packages but the same brand = 2. g. The variance of a randomly selected observation = 2 + 2 + 2 + 2 h. The 95%CI for the overall population mean is We are 95% confident that the population mean sale volume is between $503810 and $614390. 12