Final Exam Solutions

advertisement

Stat 557 all F 2000

Final Exam Solutions

A point value for each part of each problem is indicated below. These values sum to 100. You should have a corresponding number on your graded exam for each part of each question.

If you received no credit for a response you should have a zero recorded on your paper for that part of the question. Contact the instructor if no point value is indicated for any part of any problem. Some of the solutions are presented here in more detail than was expected of student responses. Furthermore, there are reasonable ways to approach some problems that are not addressed in this set of solutions.

1.

(10 points) Use Fisher's exact test: p

; value =

0

4

!

18

10

14

10

!

: 2745

The counts are too small to be sure that large sample chi-square or normal approximations to the distributions of test statistics provide accurate p-values. In this particular case you would not reject the null hypothesis using the chi-square approximation to the null distribution of the Pearson statistic.

2.

(10 points) First evaluate p =

:04+:05

2

= achieve power of .80, use samples of at least

: 045 and r = q

(2)(:045)(1;:045)

(:04)(:96)+(:05)(:95)

= 1 : 000291. To n = (

Z

:20

+ Z

:05 r )

2

[( : 04)( : 96) + ( : 05)( : 95)]

( : 04

;

: 05)

2

= 5313 addresses from each supplier.

3.

(10 points) Two measurements are taken on each subject in the study, one at the beginning of the study and the other after six weeks. These measurements may be correlated. Create the following table for the treated subjects.

1

Initial Inhaler Use after Six Weeks

Inhaler

Use Moderate Heavy

Moderate

Heavy

Y

Y

T

T

11

21

Y

T 12

Y

T 22

The corresponding population probabilities for treated subjects are

Initial Inhaler Use after Six Weeks

Inhaler

Use Moderate Heavy

Moderate

T 11 T 12

Heavy

T 21 T 22

Create a corresponding table of counts for the controls:

Initial Inhaler Use after Six Weeks

Inhaler

Use Moderate Heavy

Moderate

Heavy

Y

C 11

Y

C 21

Y

Y

C 12

C 22

The corresponding population probabilities for controls are

Initial Inhaler Use after Six Weeks

Inhaler

Use Moderate Heavy

Moderate

C 11 C 12

Heavy

C 21 C 22

There is evidence that the drug is eective in reducing inhaler use if the proportion of treated subjects who change from heavy to moderate use of inhalers is larger than the proportion of subjects who change from moderate use to heavy use (then the proportion of moderate users at the end of six weeks is larger than the proportion of moderate users at the beginning of

2

the study). We must compare the estimate of

T 21

;

T 12 to the estimate of

C 21

;

C 12 however, to adjust for possible eects of weather or other uncontrolled environmental factors.

,

Reject the null hypothesis and conclude that the drug is more eective than the placebo in reducing inhaler use if

Z = q

(p

T 21

)(1;p

T 21

)+(p

T 12

)(1;p

T

(

12 p

T 21

;

)+2(p

T 21 p

T

)(p

T

12

)

12

;

)

( p

C 21

; p

C 12

)

+

(p

C 21

)(1;p

C 21

)+(p

C 12

)(1;p

C 12

)+2(p

C 21

)(p

C 12

) n

C

> Z n

T where n

T is the number of treated subjects in the study, in the study, p

T ij

= Y

T ij

=n

T

, p

C ij

= Y

C ij

=n

C

, and Z n

C is the number of control subjects is the upper percentile of the standard normal distribution. This test has a type I error level of approximately if n

T and n

C suciently large. To compute the variance of the numerator of this test statistic we used are

V (( p

T 21

; p

T 12

)

;

( p

C 21

; p

C 12

)) = V ( p

T 21

; p

T 12

) + V ( p

C 21

; p

C 12

) because treated individuals respond independently of the controls

=

T 21

(1 n ;

T 21

)

T

+

C 21

(1 n ;

C

+

T 12

(1

C 21

) n ;

+

C 12

T

T 12

)

(1 n ;

C

+

C 12

)

T

+

21 n

T

T

C

12

21 n

)

C

C

12

)

4.(a)

(6 points) Let

Y th case does not show show remission. Since the subjects respond independently of each other, the i

Y i

= 1 if the ith case shows remission and let Y i

= 0 if the i-

's are independent Bernoulli trials and the likelihood function is

L ( 170

0

;

1

;

2

) =

Y i=1 i

Y i

(1

; i

)

1;Y i where i

= exp (

1 + exp ( 0

+

+ 1

X

1i X

+

1i 1

+ 2

X

2i X

)

2i

)

2 0 i = 1 ; 2 ;::::; 170.

4.(b)

(6 points) Solve the equation log

1

: 90

: 90

;

= ^

0

+ ^

1

X

1;:90

+ ^

2

(100)

3

4.(c) to obtain

X

1;:90

= log

;

:90

1;:90

; ^

1

^

0

;

(8 points) Use the delta method. First evaluate

^

2

(100)

= 0 : 206

2 3

2 3

;1

@ X

1;:90

@

0

6

G = 6

6

6

6

@ X

1;:90

@

1

^

1

6

7

6

7

7

7

=

7

6

6

6

6

5

4

;l og

(

:90

1;:90

)

+

^

0

+

^

2

(100)

^

2

1

7

7

7

7

7

7

5

4

@ X

1;:90

;100

@

2

^

1 at ^ = (45

X

: 5 33 : 0 : 5)

T to obtain ^

S

2

= ^

0

V ^

= (

G = 0

:

:

030303

0057882 :

: 006242 : 030303)

X

T

. Since the standard regularity conditions are satised and the sample size is not too small,

^

1;:90

;

0

;

0

;

0

;

3 is approximately distributed as a normal random variable with mean estimated variance

^

^

1;:90 and

The standard error is S = p

: 0057882 = : 0761.

4.(d)

(8 points) Since the response is binary and there may only be one subject for each set of ( X

1

; X

2

) values that appear in the data set, we would most likely be unable to reliably use the Pearson chi-square of log-likelihood ratio (deviance) statistics to test the t of this model against the general alternative. There are other ways to assess the t of the model and determine how to improve the model.

To determine if the proposed model uses all of the relevant information in and X

2 for estimating the conditional probability of remission, use a clustering algorithm like the centroid method to form about 15 groups of subjects so that subjects within a group have similar ( X

1

; X

2

) values. Add a group eect to the proposed model. If this substantially improves the estimation of the conditional probability of remission, then you need to look for other functions of to include in the model.

X

1 and

X

X

1

2

Examine smoothed residual plots or smoothed partial residual plots. This would provide some information about whether or not the log-odds of remission has a curved relationship with either the gam( ) in S-PLUS to t a generalized linear model also provides information of this type. These methods generally do not provide much information about whether or not some interaction or joint function of be included in the model.

X

1 or X

2

, and the shape of the curves. Using

X

1 and X

2

, e.g.

X

1

X

2 should

4

A classication tree could provide some insight with regard to possible threshold eects or interaction eects that should be included in the model. In this case it would be better to use the CART approach than the CHAID approach because the explanatory variable are interval variables.

One could create new explanatory variables as functions of

X

1

X

2

; X

2

; log ( X

2

)

1 dures to select variables to include in the model.

X

1 and X

2

, e.g.

; ::: . Then use stepwise procedures or other searching proce-

Good ideas about the form of the model may be obtained from subject matter theory or searching the literature on the subject matter to review how others have modeled similar data.

Other methods that we did not discuss in this course, such as projection pursuit, multivariate adaptive regression splines, neural nets, may provide some insight about how to better model the data or at least provide improved predictions.

5.(A)

(6 points) log ( m ij k `

) = +

T i

+

S j

+

C k

+

E

`

+

T S ij

+

T C ik

+

T E i`

+

C E k `

+

T C E ik `

5.(B)(i)

(6 points) df = (4)(3)(2)(2)

;

(1+3+2+1+1+6+3+3+2+2+6) = 48

;

30 = 18

5.(B)(ii)

(6 points) Given the type of accident and the size of the car, the conditional odds that the driver is severely injured are i1k 2 i3k + 2 i2k 2

= m i1k 2 m i3k 2 + m i2k 2

= exp (

S

3

1 + exp (

S

2

+

S C

3k

+

S E

+

S C

2k

32

+

T S

+

T S i2 i3

+

S E

22 when the driver is ejected from the automobile during the accident and

+

T S C i3k i2k

)

+

T S C

) i1k 1 i3k 1 + i2k 1

= m i1k 1 m i3k + 1 m i2k 1

= exp (

S

3

1 + exp (

S

2

+

S C

3k

+

S C

2k

+

T S i3

+

T S C

+

T S i2 i3k i2k

)

+

T S C

) when the driver is not ejected from the automobile during the accident. The conditional odds ratio

S E j(i;k )

= i3k 1 i1k 1

+ i2k 1 i3k 1

= e

S E

32

1 +

1 + exp ( exp

S

2

(

S

+

S C

2 +

S C

2k

2k

+

T S

+

T S i3 i3

+

T S C

+

S E

22 i2k

)

+

T S C i2k

) i1k 1

+ i2k 1

5

depends on the type of accident and car size. Maximum likelihood estimates are obtained by inserting the mle's for the required -terms into this formula. These are shown in the following table. They are nearly homogeneous and they suggest about a

10 or 11 percent increase in the odds of severe injury when the driver is ejected from the automobile during the accident.

Accident Type

Collision Collision

Car size no rollover no rollover rollover rollover

Compact 1.115

Standard 1.112

1.112

1.113

1.104

1.099

1.107

1.105

In answering this question, most students excluded the moderately severe category and conditioned on the remaining two categories: not severe (the baseline category) and severe. Then, the mle for conditional odds ratio of a severe driver injury is simply exp (^

S E

) = exp (0 : 205) = 1 : 2275, with standard error (1

32 consistent across all levels of car size and accident type.

: 23)( : 017) = : 0209. This is

5.(B)(iii)

(6 points) If you reported the second solution to part B.(ii), a good way to construct a 95% condence interval for that conditional odds ratio is to use the large sample normal distribution to rst construct a 95% condence interval for

S E

32

, i.e.,

0 : 206 (1 : 96)( : 017)

(

( : 17268 ; : 23932) and apply the exponential function to the endpoints of this interval to obtain

(1 : 188 ; 1 : 270) :

Conditioning on just the severe and not severe driver injury outcomes, the odds of that the driver is severely injured are about 19 to 27 percent greater when the driver is ejected from the automobile during the accident. This is consistent across car sizes and accident types.

If you reported the rst solution to part B.(ii), you would have a dierent condence interval for each combination of levels for car size and accident type. The condence

6

interval shown above would be a condence interval for the situation where a driver of a compact car has a non-rollover collision with another vehicle. For other combinations of accident types and car sizes, you could show how the delta method could be used to obtain approximate condence intervals. The evaluation of these condence intervals would require the estimated covariance matrix of the parameters (the inverse of the estimated Fisher information matrix) which was not provided in the statement of the problem.

5.(B)(iv)

(12 points) The estimates of the

T S ij parameters are shown in the following table.

They provide information about associations between accident type and severity of the accident for drivers of compact automobiles. The positive values in the second row reveal that when no rollover is involved the odds of moderately severe injury versus no severs injury to the driver of a compact automobile are from 2 to 14 percent higher in a collision with an object versus a collision with another vehicle. The odds of moderately severe injury versus no severe injury to the driver of a compact automobile are greatest when the automobile rolls over during the accident (from 15 to 31 percent greater than when a compact automobile does not roll over in a collision with another vehicle). The last row of the table indicates that the odds of severe injury versus no severe injury to the driver of a compact car are about the same for the rst three types of accidents but from 5 to 10 percent higher for a rollover involving a collision.

Accident Type

Accident Collision Collision the driver no rollover no rollover rollover rollover

Not severe

Moderately severe

Severe

0

0

0

0

0.073

-0.014

0

0.230

0.003

0

0.177

0.075

Adding the estimates of the

T ij

S

2

C parameters to the estimates of the

T S ij yields information about associations between accident types and severity of injuries to drivers of standard size cars. The results are similar to those for drivers of compact cars with the exception that there is essentially no dierence in the odds of moderately

7

severe injuries for collisions with an object or collisions with another vehicle when the automobile does not roll over during the accident.

Accident Type

Accident Collision Collision the driver no rollover no rollover rollover rollover

Not severe 0 0 0 0

Moderately severe

Severe 0

0 -0.003

-0.001

0.271

0.019

0.153

0.087

5.(C)

(6 points) Using severity of driver injury as the response variable, the log-linear model in part (B) is one of the log-linear models that corresponds to the following logistic regression model: ere Sev vs.

not ere: sev log i3k ` i1k `

= log m i3k m i1k

2

2

=

S

3

+

T S i3

+

S C

3k

+

S E

3`

+

T S E i3`

Mo derately ere sev vs.

not ere: sev log i2k ` i1k `

= log m i2k m i1k

2

2

=

S

2

+

T S i2

+

S C

2k

+

S E

2`

+

T S E i2`

Other solutions are obtained from other choices of the logits, such as continuous ratio logits.

Some students selected driver ejection as the response, but it is more reasonable to think of driver ejection as having an eect on severity of driver injury rather than thinking of severity of driver injury as aecting driver ejection. Severity of driver injury is more natural choice for the response, although there could be special situations where you might want to estimate the probability that a driver was ejected given the information on the other three variables.

8

Scores:

Here is a stem-leaf display of the scores for this exam. There were 100 total points for this exam.

9

8 5568

8 0012234

7 678889

7 0233

6 56778899

6 00244

5 69

5 344

9

Related documents
Download