HW4_july182011

advertisement
NCSU
ST512
HOMEWORK 4
2011
1) Consider the sample of nF = 18 girls and nM = 22 boys presented below as a random sample from a
population of interest.
Data
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
name
KATIE
LOUISE
JANE
JACLYN
LILLIE
TIM
JAMES
ROBERT
BARBARA
ALICE
SUSAN
JOHN
JOE
MICHAEL
DAVID
JUDY
ELIZABET
LESLIE
CAROL
PATTY
FREDERIC
ALFRED
HENRY
LEWIS
EDWARD
CHRIS
JEFFREY
MARY
AMY
ROBERT
WILLIAM
CLAY
MARK
DANNY
MARTHA
MARION
PHILLIP
LINDA
KIRK
LAWRENCE
sex
F
F
F
F
F
M
M
M
F
F
F
M
M
M
M
F
F
F
F
F
M
M
M
M
M
M
M
F
F
M
M
M
M
M
F
F
M
F
M
M
height
59
61
55
66
52
60
61
51
60
61
56
65
63
58
59
61
62
65
63
62
63
64
65
64
68
64
69
62
64
67
65
66
62
66
65
60
68
62
68
70
Uss: uncorrected sum of squares
Css: corrected sum of squares
Summary
Obs
1
2
3
sex
_TYPE_
_FREQ_
mean_
height
0
1
1
40
18
22
62.5500
60.8889
63.9091
overall
F
M
nobs
uss
css
std_dev
40
18
22
157202
66956
90246
701.900
221.778
389.818
4.24234
3.61189
4.30845
(a) Use regression with an indicator variable to conduct an equal variances t-test of the hypothesis that the average
heights of the two populations (boys and girls) are equal. Write the hypothesis.
(b) Is this hypothesis plausible in light of these data?
(c) Also, use PROC TTEST to run a two-sample comparisons of means and compare the results.
(d) How is the pooled sample variance from the two-sample comparison of means related to the error mean
square from the regression?
ST512
July 19
2011
Page 1
NCSU
ST512
HOMEWORK 4
2011
2)
Four randomly selected seedlings were grown under t = 5 experimental Light/Darkness conditions and heights (Y)
at four weeks were measured:
Data
treat
y
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
5
5
5
5
32.94
35.98
34.76
32.40
30.55
32.64
32.37
32.04
31.23
31.09
30.62
30.42
34.41
34.88
34.07
33.87
35.61
35.00
33.65
32.91
Summary
Trt
Light Treatment
Sample mean
1
Total Darkness
34.02
2
Low intensity Light
31.9
3
Medium intensity Light
30.84
4
Medium-High intensity Light
34.31
5
High intensity Light
34.29
Sample variance
2.73
0.87
0.15
0.20
1.52
(a) Write a general linear model using dummy variables X1 (trt=1), X2 (trt=2), X3 (trt=3), X4 (trt=4), X5
(trt=5), . Use i = 1, . . . , 20.
(b) Present this model in matrix form: X-matrix Beta-coeffficient vector and Y-vector
(c) Run a regression analysis of Y on X1, X2, X3, X4.
(d) Conduct an F-test for the null hypothesis that none of the treatments have any effect on mean plant height.
Write down the hypothesis being tested. Conclusions. Use   0.05 .
(e) Compute the predicted mean Height for each treatment and its standard error.
 o 
 
 1 
'
1  a β  1 1 0 0 0    2 
 
 3 
 
 4
'
var     a a
(f) Calculate a 95% confidence interval for the predicted mean height for trt 1.
(g) Calculate a 95% confidence interval for the predicted mean height for trt 5.
(h) Calculate a 95% confidence interval for the mean difference between treatments 1 and 5. Note that predicted
mean for treatment 1 is given by
(i)
1  5  a'β . Find a’. Calculate the standard error for the mean difference
(j) Plot residuals vs predicted. Should we be concern about the validity of the results?
ST512
July 19
2011
Page 2
NCSU
3)
ST512
HOMEWORK 4
2011
A sample of n = 30 subjects were randomly assigned to three therapies/treatments for improving mental capacity.
On each subject, pretest (Z) and posttest (Y) measurements were made.
Data (Rao 12.3a)
T1
T2
T3
z y z y z y
=================
24 45 23 28 27 34
28 50 33 39 27 31
38 59 31 36 44 55
42 60 34 39 38 43
24 47 18 22 32 44
39 66 24 28 26 28
45 76 41 49 24 33
19 50 34 39 13 13
19 39 30 33 36 39
22 36 39 43 52 58
a)
b)
c)
d)
Calculate the difference (DIFF) between post-test and pre-test scores for each subject.
Construct the one-way ANOVA table for comparing the three treatment means.
Conduct an F-test for equality of means. That is, specify a model and a null hypothesis for no therapy effect,
then compute the F-ratio, F(.05, 2, 27) = 3.35,
Plot residuals vs predicted. Check assumptions. Run the Brown-Forsythe homogeneity test. Conclusions.
Homogeneity of variance test (SAS Manual)
One of the usual assumptions for the GLM procedure is that the underlying errors are all uncorrelated with homogeneous
variances. You can test this assumption in PROC GLM by using the HOVTEST option in the MEANS statement, requesting a
homogeneity of variance test. This section discusses the computational details behind these tests. Note that the GLM procedure
allows homogeneity of variance testing for simple one-way models only.
Bartlett (1937) proposes a test for equal variances that is a modification of the normal-theory likelihood ratio test (the
HOVTEST=BARTLETT option). While Bartlett's test has accurate Type I error rates and optimal power when the underlying
distribution of the data is normal, it can be very inaccurate if that distribution is even slightly nonnormal (Box 1953).
An approach that leads to tests that are much more robust to the underlying distribution is to transform the original values of the
dependent variable to derive a dispersion variable and then to perform analysis of variance on this variable. The significance level
for the test of homogeneity of variance is the p-value for the ANOVA F-test on the dispersion variable.
Brown and Forsythe (1974) suggest using the absolute deviations from the group medians:
zijBF  yij  mi
, where mi is the
th
median of the i group. You can use the HOVTEST=BF option to specify this test.
Simulation results show that the Brown-Forsythe test seems best at providing power to detect variance differences while
protecting the probability of a Type I error
e)
ST512
Test the difference between treatment T2 and T3. Use t-test for unequal variances.
July 19
2011
Page 3
Download