Fall 2013 STATISTICS 479 Assignment #8 (40 points) Instructions: Turn in the programs, any plots and the hand written answers for each problem. 1. A marketing consultant conducted an experiment to compare four different package designs for a new breakfast cereal. Twenty four stores with approximately similar sales volumes were selected and each store was required to carry only one of the package designs. Thus each package design was randomly assigned to six stores. Sales, in number of cases, were observed for the study period. The data are given below: Package Design 1 2 3 4 Package Design Package Design 12 14 19 24 1 3 colors, with cartoons 18 12 17 30 2 3 colors, without cartoons 14 13 21 27 3 5 colors, with cartoons 15 10 23 28 4 5 colors, without cartoons 17 15 16 32 15 12 20 30 Prepare and run a SAS program to obtain the output necessary to provide all of the following information. You must extract numbers from the SAS output and write your own answers on a separate sheet of paper. (a) Assuming the fixed effects oneway classification model for this data, give estimates of the true mean sales volumes µ1 , µ2 , µ3 , µ4 , and the error variance σ 2 . Write down the corresponding analysis of variance table including the p-value. State the hypothesis tested by the F-statistic and your decision based on the p-value. Use α = .05. (b) Use contrast statements to compute F-statistics for i. compare the the average effect of the 3-color designs with the average effect of the 5-color designs, ii. compare the the average effect of the designs with cartoons with the average effect of the designs without cartoons, iii. compare the the effect of the 3-color design with cartoons with the effect of the 3-color design without cartoons, and iv. compare the the effect of the 5-color design with cartoons with the effect of the 5-color design without cartoons. Add lines containing the corresponding sum of squares, degrees of freedom, and the F-statistic in an expanded anova table. Based on the p-values, what are your conclusions from each of these tests. Use α = .05. (c) Include an estimate statement for the four comparisons above. Use the results from this statement to obtain a t-test of this comparison. State your decision based on the p-value. Use α = .05. (d) Compute 95% confidence intervals for all pairwise differences in true mean sales volumes i.e., (µp − µq )’s. Extract the 6 confidence intervals of interest from the output and report them separately. (e) Use the confidence intervals in part (d) to find the differences that are significant by checking those that do not include zero in the interval. What is your conclusion about the true mean sales volumes corresponding to the four package designs? 1 2. Six samples of each of four types of cereal grain grown in a certain region were randomly selected and analyzed to determine the thiamin content (mcg/gm) in an experiment. The data are: Cereal Wheat Barley Maize Oats Thiamin 5.2 4.5 6.5 7.0 5.8 4.7 8.3 6.7 content (mcg/gm) 6.0 6.1 6.7 5.8 6.1 7.5 5.9 5.7 6.4 4.9 6.0 5.2 7.8 7.0 5.9 7.2 Prepare and run a SAS program to obtain the output necessary to provide all of the following information. You must extract numbers from the SAS output and write your own answers on a separate sheet of paper. (a) Assuming the fixed effects oneway classification model for this data give estimates of true mean thiamin content µ1 , µ2 , µ3 , µ4 , and the error variance σ 2 . Write down the corresponding analysis of variance table including the p-value. State the hypothesis tested by the F-statistic and your decision based on the p-value. (b) Include a statement to compare true thiamin content means using the the LSD procedure using α = .05. Use the letters on the output to conduct the underlining method on your answer sheet showing the ordered sample means labelled with the corresponding treatment. Make a concluding statement. (c) Include a statement to compare true thiamin content means using the the TUKEY procedure using α = .05. Use the letters on the output to conduct the underlining method on your answer sheet showing the ordered sample means labelled with the corresponding filmtypes. Make a concluding statement. (d) In what way are the conclusions from parts (b) and (c) different? 3. The data displayed below are results from an experiment on the use of drugs in the treatment of leprosy. The drugs were A and D, which were antibiotics and F, an inert drug used as a control. The dependent variable Y (PostScore) was a score of leprosy bacilli measured on each patient after several months of treatment. The covariate X (PreScore)was a pretreatment score of leprosy bacilli. A X Y 11 6 8 0 5 2 14 8 19 11 6 4 10 13 6 1 11 8 3 0 Drugs D X Y 6 0 6 2 7 3 8 1 18 18 8 4 19 14 8 9 5 1 15 9 F X 16 13 11 9 21 16 12 12 7 12 Y 13 10 18 5 23 12 5 16 1 20 (a) Use proc glm and the one-way covariance (equal slopes) model to analyze this data. Write down the model first identifying each term as in the text book. Construct an adjusted anova table as on page 311 of the text. (b) Using the above anova table, test the hypothesis H0 : µ1 = µ2 = µ3 (use the p-value and state decision). 2 (c) Construct 95% confidence intervals for all differences in pairs of means (e.g., µ1 − µ2 ) adjusted for multiple testing using the Bonferroni method. (d) What does the test of H0 : β = 0 tell you? Test this hypothesis using the above adjusted anova table and state your conclusion. (e) Construct an analysis of variance that is not adjusted for the pre-score. What conclusion can you draw from this Anova table. 4. A textile mill weaves a fabric on a large number of looms. To investigate whether there is an appreciable variation among the output of cloth per minute by the looms, the process engineer selects 5 looms at random and measured their output on 5 randomly chosen days. The following data are obtained: Loom 1 2 3 4 5 14.0 13.9 14.1 13.6 13.8 Output (lb/min) 14.1 14.2 14.0 13.8 13.9 14.0 14.2 14.1 14.0 13.8 13.9 13.7 13.6 13.9 13.8 14.0 13.9 14.0 (a) Write the oneway random model you will use to analyze this data stating assumptions about each parameter in the model and tell what each parameter represents. Construct the corresponding analysis of variance using SAS/MIXED procedure. Write the anova table including a column for Expected Mean Square (EMS). (b) Express the hypothesis that there is no variability in output among the looms, in terms of the model parameters. Perform a test of this hypothesis using the analysis in part (a). (c) If the hypothesis in part (b) is rejected, estimates of the variance components associated with the model in part (a) may be desired. Obtain these estimates using the results of parts (a) and (b). Notes: • Along with the data, I have included pieces of SAS code that will create the SAS data sets necessary to perform the analyses required for each problem. These are downloadable from the Homework Assignments page as usual. • When the levels of the factors are of character type it is recommended that you use the order=data option in the the proc statements to preserve the ordering of the class levels found in the input data. Due Thursday 12, December, 2013 3