Midterm2S06Sol

advertisement

Statistics 512 Midterm 2 Spring 2006

Statistics 512

Midterm 2

April 11, 2006

The following rules apply.

1.

You may bring with 3 sheets of paper, double-sided with any information you need.

2.

You may use a calculator.

3.

You may not collaborate or copy.

4.

You may not use any device, such as a cell-phone or PDA that allows you to access the internet or other outside information.

5.

Failure to comply with items 3 or 4 could lead to reduction in your grade, or disciplinary action.

I have read the rules above and agree to comply with them.

Signature ________________________________________________

Name (printed) ___________________________________________ problem your score total points

1

21

2

20

19 3

Total 60

1

Statistics 512 Midterm 2 Spring 2006

1. PBG is a drug known to raise blood pressure in some animals. As a pilot study the researchers wished to know if rabbits vary in their response to the drug. 5 rabbits were used in the study. The change in blood pressure after administering 12.50 mg of PBG was recorded. The effects of PBG are very short-term. Hence it was possible to repeat the experiment on every rabbit, giving 2 measurements per rabbit. a) (4) Write the one way ANOVA model for these data assuming that rabbit is a fixed effect. Be sure to define all of your notation and include any constraints and distribution assumptions.

Y ij

    i

  ij

, i

1 , 2 , 3 , 4 , 5 ; j

1 , 2 .

Y ij

is the response of i th rabbit on the

is the population mean response, j th measurement,

 i

is the effect of the i th rabbit,

 ij

is the random error,

 ij

iid i

5 

1

 i

N ( 0 ,

2

)

.

0 ,

1 point per term. b) (2) Using the same notation as part (a) write the one-way ANOVA model for these data assuming that rabbit is a random effect. You do not need to redefine the variables, but you should include any constraints and distribution assumptions.

Y ij

    i

  ij

,

are iid i

N ( 0 ,

2

) ,

 ij are iid N ( 0 ,

2

) ,

and i

 ij

are independent.

1 point for iid Normal, 1 point for "

and i

 ij

are independent".

2

Statistics 512 Midterm 2 Spring 2006 c) (2) Is it more reasonable to assume that rabbit is a fixed effect or a random effect? Briefly justify your answer.

Rabbit should be a random effect, because the rabbits are representative of the population, not levels of specific interest.

1 point for random, 1 point for right reason. (If the person says fixed but has the "right" reason, they get 1 point.) d) (3) Under the fixed effects model what are:

The mean response for rabbit 1:

  

1

The covariance between the two responses for rabbit 1:

0

The covariance between the first response for rabbits 1 and 2:

0

1 point each.

3

Statistics 512 Midterm 2 Spring 2006 d) (3) Under the random effects model what are:

The mean response for rabbit 1:

The covariance between the two responses for rabbit 1:

2

The covariance between the first response for rabbits 1 and 2:

0

1 point each e) (2) The investigators decided to use the random effects model. Complete the ANOVA table.

Source DF Sum of

Squares

Mean Square F pvalue rabbit 4 2.198500 0.5496

Residual 5 2.422500 0.4845 p>.1 is an acceptable p-value.

1 point for each box.

1.134 0.4358

XXXXXX XXXXXX

4

Statistics 512 Midterm 2 Spring 2006 f) (3) Test whether or not there is a statistically significant rabbit effect. State the null and alternative hypotheses in terms of the parameters, give the p-value for the test, and state your conclusions in terms of correlations among observations.

H

0

:

2

0 , H a

:

2

0 p-value=0.4358

There is no evidence that the 2 observations taken on the same rabbit are correlated.

The answer must be compatible with the p-value computed for the ANOVA table.

1 point for the null and alternative hypotheses.

1 point for the p-value.

1 point for the conclusion stated in terms of correlation. g) (2) The average response for Rabbit 1 was 1.61. When asked to estimate the mean response for

Rabbit 1, SAS PROC GLM obtains the estimate 1.61, but SAS PROC MIXED obtains the estimate

1.80. Should we complain to SAS that there is a mistake in PROC MIXED? Briefly explain your answer.

SAS PROC GLM estimates the mean response using the fixed effect model. For this model, the estimated mean is Y

1

=1.61.

SAS PROC MIXED estimates the mean response as the BLUP (best linear unbiased predictor). This is a weighted average of Y

1

and Y

 

.

For full points, the answer must either use the term BLUP or say something about shrinking or averaging with Y

 

.

5

3 0 t l d e a

B

P

2 0

1 0

Statistics 512 Midterm 2 Spring 2006

2. PBG is a drug known to raise blood pressure in some animals. 5 rabbits were used in a study of

PBG. The response is the difference in blood pressure before and after the PBG dose was administered. The doses used were 6.25, 12.5, 25, 50, 100 and 200 mg. One question of interest is whether the effect of PBG is linear in dose.

Since the effects of PBG are very short-term, each rabbit was given each dose twice, in random order, so that there are 12 measurements (2 at each dose) for every rabbit. Within each rabbit, the measurements are thought to be independent. a) (2) Should dose of PBG be considered a fixed or random effect in this study. Briefly justify your answer.

PBG should be considered as a fixed effect because it is quantitative.

This is the only acceptable answer. b) (1) Below is a boxplot of the response as a function of dose. What assumption of ANOVA appears to be violated?

4 0

0

6 . 2 5 1 2 . 5 2 5 5 0 1 0 0 2 0 0 d o s e

The variance of the observations appears to increase with dose, so the constant variance assumption is violated.

6

Statistics 512 Midterm 2 Spring 2006 c) (4) The investigators decided that the appropriate response variable is log(BPfinal/BPinitial). The investigators decided to fit a 2-way ANOVA model with a random effect for rabbit and a fixed effect for PBG. Write out the factor effects model for this experiment. Be sure to define all terms and include all constraints and distribution assumptions.

Y ijk

    i

  j

(



) ij

  ijk

, i

1 , 2 , 3 , 4 , 5 ; j

1 ,  , 6 ; k

1 , 2 .

Y ijk

is the response of k th measurement of i th rabbit given j th dose,

is the overall mean response,

 i

is the effect of the i th rabbit,

 j

is the effect of the j th dose, j

6 

1

 j

0 ,

(



) ij

is the interaction of the i th rabbit and the j th dose, ,

 ijk

is the random error,

, i

(



) ij

,

are independent ijk

are iid i

N ( 0 ,

2

(



) ij

are iid N ( 0 ,

)

2



)

 ij are iid N ( 0 ,

2 ) .

1 point for listing all the terms

1 point for the constraint on the fixed effect

1 point for the Normality assumptions on the random components

1 point for all the random components independent

Note: Since this is a mixed model, students could also give the constrained mixed model. This adds a constraint on the interaction and redefines the variance components. It is right if the answer matches the book.

7

Statistics 512 Midterm 2 Spring 2006 d) (1) Using the model you wrote in part c, what is the covariance between the two measurements taken on the same rabbit at dose 12.5?

2

 

2

 e) (1) Using the model you wrote in part c, what is the covariance between two measurements taken on different rabbits at dose 12.5?

0

8

Statistics 512 Midterm 2 Spring 2006 f) (3) For the model you wrote in part c, what is the variance of the average response (averaged over all measurements on all 5 rabbits) at dose=12.50?

Var ( Y

2

)

Var

1

(

10

5 2  i

1 k

1

[

   i

 

2

(



) i 2

  i 2 k

])

Var (

 

1

5 i

5 

1

 i

 

2

1

5 i

5 

1

(



) i 2

1

10 i

5 2 

1 k

1

 i 2 k

)

=

1

5

2

1

5

2



1

10

2

.

Full points for getting the right answer as long as they show some work.

If they get the wrong answer, 1 point for each expression above. g) (4) Fill in the blanks in the ANOVA table below (remembering that the data are balanced):

Source dose

DF Mean Square F p-value

5 16.316275 68.2321 P<0.001 rabbit 4 0.3638 rabbit*dose 20 0.239129

1.5214 p>0.1

0.5413 p>0.5

Residual

1 point for each F.

1 point for the p-values

30 0.441739 xxxx xxxxxx

9

Statistics 512 Midterm 2 Spring 2006

SAS gives the following table of Expected Mean Squares for the effects in the model:

Source Expected Mean Square dose Var(Residual) + 2 Var(rabbit*dose)+ Q(dose) rabbit Var(Residual) + 2 Var(rabbit*dose) + 12 Var(rabbit) rabbit*dose Var(Residual) + 2 Var(rabbit*dose)

Residual Var(Residual) .

. h) (2) Using the method of moments (also called the ANOVA estimator) estimate variance component for rabbit*dose. (Your answer should be a number, not a formula.)

2



MS ( R

D )

MSE

2

0 .

239129

0 .

441739

2

 

0 .

1013

2 points for the right answer.

1 point for understanding that the answer has to involve both MS(RD) and MSE, even if they get they rest of the computation wrong. i) (2) The investigators also decided to try a REML analysis of the data. The type 3 test of fixed effects for this analysis is given below. Since the data are balanced, why is this test different than the test in the ANOVA table in part g?

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

dose 5 50 45.24 <.0001

The test is different because of the negative estimated variance component for rabbit*dose.

REML are known to be the same as the ANOVA estimates when the design is balanced and all variance components are positive.

10

Statistics 512 Midterm 2 Spring 2006 j) (2) An initial objective of the study is to determine if the effect of PBG on the change in blood pressure is linear. Since the response variable is now log(BPfinal/BPinitial) the investigators propose to determine if the response is linear in log(PBG). Is this the same hypothesis? Justify your response. log(BPfinal/BPinitial)=

0

 

1 log(PBG)

BPfinal

 e

0

PBG

1

BPinitial

This is not the same hypothesis. The original objective was to test

E(BPfinal-BPinitial)=

0

 

1

PBG.

The result on the log scale cannot be transformed to this model even if

1

1 .

It is not enough to say that this is a different model. E.g. if the model were log(BPfinal) =



 log(PBG) the hypothesis is essentially the same. The problem is the ratio, not taking logs.

11

Statistics 512 Midterm 2 Spring 2006

3. PBG is a drug known to raise blood pressure in some animals. 5 rabbits were used in a study of

PBG. The response is log(BPfinal/BPinitial) before and after the PBG dose was administered. The doses used were 6.25, 12.5, 25, 50, 100 and 200 mg.

Since the effects of PBG are very short-term, each rabbit was given each dose twice, in random order, so that there are 12 measurements (2 at each dose) for every rabbit. Within each rabbit, the measurements are thought to be independent.

Normal blood pressue in a rabbit is about 87 mm Hg. However, handling rabbits during experiments can raise their blood pressure and this might have an effect on the blood pressure elevation induced by

PBG. To control for this effect, the blood pressure measurement prior to administration of PBG was used as a covariate in the experiment (BPinit). a) (4) Write a linear model for this experiment assuming that there is a linear effect of initial blood pressure and no interaction between initial blood pressure and the other factors in the model. In this model, rabbit should be a random effect and dose should be a fixed effect Be sure to define all of your notation and include any constraints and distribution assumptions.

Y ijk

    i

  j

 

( x ijk

 x

  

)

  ijk

, i

1 , 2 , 3 , 4 , 5 ; j

1 ,  , 6 ; k

1 , 2 .

Y ijk

is the initial BP for the k th measurement of i th rabbit given j th dose, x ijk

is the response of k th measurement of inital blood pressure for the

is the slope for the relation between Y and X

is the overall mean response, i th rabbit given j th dose,

 i

is the effect of the i th rabbit,

 j

is the effect of the j th dose, j

6 

1

 ijk

is the random error,

 i

,

 ijk

are independent j

 i

are iid

 ij are iid

N ( 0 ,

 2

N ( 0 ,

2

)

)

.

0 ,

1 point for adding the slope and covariate correctly to the model.

1 point for defining x.

2 points for adding the covariate to whatever model they came up with in problem B, with the same constraints, etc. (i.e. if they made a mistake earlier, they should be using the same wrong model for the treatment and rabbit effects)

12

Statistics 512 Midterm 2 Spring 2006

For the remainder of this problem you can assume that the necessary distribution assumptions are satisfied. b) (6) Below is the Type 3 analysis of variance for the model that includes dose, inital blood pressure, the dose*initial blood pressure interaction, and random effects for rabbit and rabbit*dose. Test whether there is an interaction between dose and initial blood pressure.

Type 3 Analysis of Variance

Source dose

DF Sum of

Squares

Mean

Square

Expected Mean Square

5 2.2365 0.4473 Var(Residual) +

0.0081 Var(rabbit*dose) + Q(dose) initBP 1 1.6302 1.6302 Var(Residual) + Q(initBP,initBP*dose) initBP*dose 5 1.3515 0.2703 Var(Residual) + Q(initBP*dose) rabbit 4 1.5817 0.3954 Var(Residual) +

1.6548 Var(rabbit*dose) +

9.9288 Var(rabbit) rabbit*dose 20 4.1632 0.2081 Var(Residual) +

1.8035 Var(rabbit*dose)

Residual 24 5.0067 0.2086 Var(Residual)

Null and Alternative Hypotheses:

H : the slope for initial BP is the same at every dose, (no interaction between dose and initial BP)

0

H : the slope of initial BP differs by dose, (there exists interaction between dose and initial BP) a

F-statistic From the expected mean squares, we see that the appropriate error term for the interaction is the MSE

F

* 

MS ( initialBP

 dose )

MSE

0 .

2703

0 .

2086

1 .

29 d.f. for F p-value

5,24 >0.1

Conclusion:

There is no interaction between initial BP and dose.

1 point for the null and alternative hypothesis.

½ point each for d.f.

1 point each for p-value and conclusion

2 points for F – one for the denominator and one for the right value.

13

Statistics 512 Midterm 2 Spring 2006 c) (2) The objective of the experiment is to determine whether there is an effect of dose of PBG. Why is it important to establish whether or not there is an interaction between dose and initial blood pressure?

Initial BP is a covariate in this model. If there is interaction between dose and initial blood pressure, then the dose effect depends on the initial blood pressure, so it would be difficult to make general inferences about the effects of PBG. d) (2) Some investigators would also fit an interaction between initial blood pressure and the random effect "rabbit". What is the meaning of this interaction and should it be considered a fixed or random effect?

This means that there is a different slope for every rabbit. Since the rabbits are a random sample, it seems like the slopes should be random.

1 point for random.

1 point for some indication that the slope differs among rabbits.

14

Statistics 512 Midterm 2 Spring 2006 e) (2) The investigators decided that the model with no interaction between initial blood pressure and the other factors was appropriate, and fitted an analysis of covariance model with random effects for rabbit and rabbit*dose, and fixed effects for dose.

Part of the resulting output is below. Is there a statistically significant dose effect? Briefly justify your answer.

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

dose 5 18.5 57.33 <.0001

initBP 1 41.6 25.46 <.0001

Yes, there is a significant dose effect since the p-value for dose is <.0001. f) (1) The investigators were very surprised to see fractional degrees of freedom for the F-statistic, pointing out that the tables include only whole numbers. Why are there fractional degrees of freedom?

In PROC MIXED the d.f. are computed by approximation (Satterthwaite approximation or Kenward-

Rogers estimator). These are often fractional.

1 point for mentioning Satterthwaite, Kenward-Rogers, variance components or something that indicates that we have an approximate F-test and random effects.

15

Statistics 512 Midterm 2 Spring 2006 g) (2) The investigators wanted to use the results of this experiment to select levels of PBG to use in another experiment with a drug that is supposed to reduce the effect of PBG. The target is to find 2 levels of PBG for which the response will be close to log(10) and log(20) respectively. Should the cell means or adjusted cell means be used for selecting these 2 levels? Briefly justify your answer. You can assume that the new experiment will involve similar levels of handling of the rabbits, and therefore similar levels of the covariate.

Since the rabbits will be handled similarly, we expect that the distribution of initial BP will be similar to the distribution observed in this study. Since we cannot predict the initial BP of the new sample of rabbits, it seems like we should do our estimation at the mean level. The adjusted treatment means are the estimates of the treatment effect at the mean level of initial BP, and so seem like the appropriate estimates to use to select levels of PBG.

I will accept either answer for 2 points, assuming that the justification matches the answer.

16

Download