NCSU ST512 HOMEWORK 4 2011 1) Consider the sample of nF = 18 girls and nM = 22 boys presented below as a random sample from a population of interest. Data Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 name KATIE LOUISE JANE JACLYN LILLIE TIM JAMES ROBERT BARBARA ALICE SUSAN JOHN JOE MICHAEL DAVID JUDY ELIZABET LESLIE CAROL PATTY FREDERIC ALFRED HENRY LEWIS EDWARD CHRIS JEFFREY MARY AMY ROBERT WILLIAM CLAY MARK DANNY MARTHA MARION PHILLIP LINDA KIRK LAWRENCE sex F F F F F M M M F F F M M M M F F F F F M M M M M M M F F M M M M M F F M F M M height 59 61 55 66 52 60 61 51 60 61 56 65 63 58 59 61 62 65 63 62 63 64 65 64 68 64 69 62 64 67 65 66 62 66 65 60 68 62 68 70 Uss: uncorrected sum of squares Css: corrected sum of squares Summary Obs 1 2 3 sex _TYPE_ _FREQ_ mean_ height 0 1 1 40 18 22 62.5500 60.8889 63.9091 overall F M nobs uss css std_dev 40 18 22 157202 66956 90246 701.900 221.778 389.818 4.24234 3.61189 4.30845 (a) Use regression with an indicator variable to conduct an equal variances t-test of the hypothesis that the average heights of the two populations (boys and girls) are equal. Write the hypothesis. (b) Is this hypothesis plausible in light of these data? (c) Also, use PROC TTEST to run a two-sample comparisons of means and compare the results. (d) How is the pooled sample variance from the two-sample comparison of means related to the error mean square from the regression? ST512 July 19 2011 Page 1 NCSU ST512 HOMEWORK 4 2011 2) Four randomly selected seedlings were grown under t = 5 experimental Light/Darkness conditions and heights (Y) at four weeks were measured: Data treat y 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 32.94 35.98 34.76 32.40 30.55 32.64 32.37 32.04 31.23 31.09 30.62 30.42 34.41 34.88 34.07 33.87 35.61 35.00 33.65 32.91 Summary Trt Light Treatment Sample mean 1 Total Darkness 34.02 2 Low intensity Light 31.9 3 Medium intensity Light 30.84 4 Medium-High intensity Light 34.31 5 High intensity Light 34.29 Sample variance 2.73 0.87 0.15 0.20 1.52 (a) Write a general linear model using dummy variables X1 (trt=1), X2 (trt=2), X3 (trt=3), X4 (trt=4), X5 (trt=5), . Use i = 1, . . . , 20. (b) Present this model in matrix form: X-matrix Beta-coeffficient vector and Y-vector (c) Run a regression analysis of Y on X1, X2, X3, X4. (d) Conduct an F-test for the null hypothesis that none of the treatments have any effect on mean plant height. Write down the hypothesis being tested. Conclusions. Use 0.05 . (e) Compute the predicted mean Height for each treatment and its standard error. o 1 ' 1 a β 1 1 0 0 0 2 3 4 ' var a a (f) Calculate a 95% confidence interval for the predicted mean height for trt 1. (g) Calculate a 95% confidence interval for the predicted mean height for trt 5. (h) Calculate a 95% confidence interval for the mean difference between treatments 1 and 5. Note that predicted mean for treatment 1 is given by (i) 1 5 a'β . Find a’. Calculate the standard error for the mean difference (j) Plot residuals vs predicted. Should we be concern about the validity of the results? ST512 July 19 2011 Page 2 NCSU 3) ST512 HOMEWORK 4 2011 A sample of n = 30 subjects were randomly assigned to three therapies/treatments for improving mental capacity. On each subject, pretest (Z) and posttest (Y) measurements were made. Data (Rao 12.3a) T1 T2 T3 z y z y z y ================= 24 45 23 28 27 34 28 50 33 39 27 31 38 59 31 36 44 55 42 60 34 39 38 43 24 47 18 22 32 44 39 66 24 28 26 28 45 76 41 49 24 33 19 50 34 39 13 13 19 39 30 33 36 39 22 36 39 43 52 58 a) b) c) d) Calculate the difference (DIFF) between post-test and pre-test scores for each subject. Construct the one-way ANOVA table for comparing the three treatment means. Conduct an F-test for equality of means. That is, specify a model and a null hypothesis for no therapy effect, then compute the F-ratio, F(.05, 2, 27) = 3.35, Plot residuals vs predicted. Check assumptions. Run the Brown-Forsythe homogeneity test. Conclusions. Homogeneity of variance test (SAS Manual) One of the usual assumptions for the GLM procedure is that the underlying errors are all uncorrelated with homogeneous variances. You can test this assumption in PROC GLM by using the HOVTEST option in the MEANS statement, requesting a homogeneity of variance test. This section discusses the computational details behind these tests. Note that the GLM procedure allows homogeneity of variance testing for simple one-way models only. Bartlett (1937) proposes a test for equal variances that is a modification of the normal-theory likelihood ratio test (the HOVTEST=BARTLETT option). While Bartlett's test has accurate Type I error rates and optimal power when the underlying distribution of the data is normal, it can be very inaccurate if that distribution is even slightly nonnormal (Box 1953). An approach that leads to tests that are much more robust to the underlying distribution is to transform the original values of the dependent variable to derive a dispersion variable and then to perform analysis of variance on this variable. The significance level for the test of homogeneity of variance is the p-value for the ANOVA F-test on the dispersion variable. Brown and Forsythe (1974) suggest using the absolute deviations from the group medians: zijBF yij mi , where mi is the th median of the i group. You can use the HOVTEST=BF option to specify this test. Simulation results show that the Brown-Forsythe test seems best at providing power to detect variance differences while protecting the probability of a Type I error e) ST512 Test the difference between treatment T2 and T3. Use t-test for unequal variances. July 19 2011 Page 3