STA 6505, Fall 2008, Homework #1 Solutions

advertisement
STA 6505, Fall 2008, Homework #2 Solutions
Ch. 2: Exercises 2.6, 2.7, 2.8, 2.12, 2.15, 2.18
2.6. A newspaper article preceding the 1994 World Cup semifinal match between Italy and
Bulgaria stated that “Italy is favored 10-11 to beat Bulgaria, which is rated 10-3 to reach the
final.” Suppose that this means that the odds that Italy wins are 1.1, and the odds that Bulgaria
wins are 0.3 Find the probability that each team wins, and comment.

. Hence, the probability that Italy wins
1 
1.1
0.3
 0.5238 , and the probability that Bulgaria wins is  
 0.2308 . These
is  
1  1.1
1  0.3
two probabilities do not sum to 1. Hence, unless there is the possibility of a tie (not in a World
Cup semifinal), the two odds quoted do not agree.
The probability may be found from the odds as  
2.7. In the United States, the estimated annual probability that a woman over the age of 35 dies
of lung cancer equals 0.001304 for current smokers and 0.000121 for nonsmokers (M. Pagano
and K. Gauveau, Principles of Biostatistics, Duxbury Press, Pacific Grove, CA. 1993, p. 134).
a) Find and interpret the difference of proportions and the relative risk. Which measure is more
informative for this data? Why?
The difference of proportions is 0.001304 – 0.000121 = 0.0012. This says that the difference
between the proportions of female smokers over 35 who die of lung cancer and the proportion of
female nonsmokers over 35 who die of lung cancer is 0.12%, a very small fraction.
The relative risk is R.R. = 0.001304/0.000121 = 10.78. The likelihood of a woman over 35
dying of lung cancer is 10.78 times as high for smokers as for nonsmokers. The relative risk
makes more sense in interpreting this data, since the difference of proportions makes it appear
there is no association. The event under consideration, dying of lung cancer, is a rarely
occurring event, so we would expect that the difference of proportions would seem relatively
small, while the relative risk would seem “large.”
b) Find and interpret the odds ratio. Explain why the relative risk and odds ratio take similar
values.
The odds ratio is  = (.001304/.998696)/(.000121/.999879) = 10.79. The odds of a woman over
35 dying of lung cancer if she is a smoker are 10.79 times as large as the odds of a woman over
35 dying of lung cancer if she is a nonsmoker. The odds ratio and the relative risk are close in
value for rarely occurring events. This happens when the proportion in the first category (dying
of lung cancer) is close to zero.
2.8. For adults who sailed on the Titanic on its fateful voyage, the odds ratio between gender
(female, male) and survival (yes, no) was 11.4. (For data, see R. J. M. Dawson, Journal of
Statistics Education, 3, 1995).
a) What is wrong with the interpretation, “The probability of survival for females was 11.4 times
that for males?”
The correct interpretation would be “The odds of survival for females was 11.4 times that for
males.” The odds ratio and the relative risk would not be nearly the same in this case, since the
event in question (survival) was not a sufficiently rare event.
b) The odds of survival for females equaled 2.9. For each gender, find the proportion who
survived.

2 .9
 0.25 . The
Since the odds ratio was   F  11.4 , and  F  2.9 , then  M 
11.4
M
F
 0.74 . The proportion of males
proportions of females who survived was then  F 
1 F
M
 0.20 .
who survived was  M 
1 M
2.12. Table 2.10 refers to applicants to graduate school at the University of California at
Berkeley, for fall, 1973. It presents admissions decisions by gender of applicant for the six
largest graduate departments. Denote the three variables by A = whether admitted, G = gender,
and D = Department. Find the sample AG conditional odds ratios and the marginal odds ratio.
Interpret, and explain why they give such different indications of AG association.
Let the variable A be coded 1 = “Yes”, 2 = “No”. Let the variable G be coded 1 = “Male”, 2 =
“Female”.
For Department A, the conditional odds ratio is
n n
51219  0.3492 .
 A  11| A 22| A 
n12| A n21| A 89313
For Department B, the conditional odds ratio is
n n
3538  0.8025 .
 B  11|B 22|B 
n12|B n21|B 17 207 
For Department C, the conditional odds ratio is
n n
120391  1.1331 .
 C  11|C 22|C 
n12|C n21|C 202 205
For Department D, the conditional odds ratio is
n n
138244  0.9213 .
 D  11|D 22|D 
n12|D n21|D 131279
For Department E, the conditional odds ratio is
n n
53299  1.2216 .
 E  11|E 22|E 
n12|E n21|E 94 138
For Department F, the conditional odds ratio is
n n
22317   0.8279 .
 F  11|F 22|F 
n12|F n21|F 24 351
The marginal odds ratio is
n n
11981278  1.8411 .
  11 22 
n12 n21 557 1493
The marginal odds ratio is greater than any of the conditional odds ratios. There is a relatively
strong association between the variable D (department) and each of the other two variables, A =
Admitted? and G = Gender, that accounts for the discrepancy between the conditional odds ratios
and the marginal odds ratio. Some departments have higher proportions of females admitted;
others have higher proportions of males admitted.
2.15. At each age level, the death rate is higher in South Carolina than in Maine, but overall, the
death rate is higher in Maine. Explain how this could be possible. (For data, see H. Wainer,
Chance, 12:44, 1999). The age distribution is relatively higher in Maine.
2.18. Table 2.11 refers to a retrospective study of lung cancer and tobacco smoking among
patients in several English hospitals. The table compares male lung cancer patients with control
patients having other diseases, according to the average number of cigarettes smoked daily over
a 10-year period preceding the onset of the disease. The lung cancer group has n = 1357, and the
control group has n = 1357.
a) Find the sample odds of lung cancer at each smoking level and the five odds ratios that pair
each level of smoking with no smoking. As smoking increases, is there a trend? Interpret.
Daily Avg. No. of Cigarettes
None
<5
5 – 14
15 – 24
25 – 49
50 +
Odds of Lung Cancer
0.114754
0.426357
0.857895
1.102088
1.902597
3.166667
Ln(Odds)
-2.164964
-0.852479
-0.153274
0.097207
0.643220
1.152680
The odds ratios are:
For < 5 v. None, the odds ratio is
n n
  11 22  3.7154 .
n12 n21
For 5 – 14 v. None, the odds ratio is
n n
  11 22  7.4759 .
n12 n21
For 15 – 24 v. None, the odds ratio is
n n
  11 22  9.6039 .
n12 n21
For 25 – 49 v. None, the odds ratio is
n n
  11 22  16.5798 .
n12 n21
For 50 + v. None, the odds ratio is
n n
  11 22|  27.5953 .
n12 n21
These odds ratios must be interpreted carefully, due to the retrospective nature of the study.
Consider the random experiment of randomly selecting a patient from a subset of the study group
consisting of two levels of smoking. As the level of smoking increases, the odds that the patient
is in the lung cancer group, rather than the control group, increases.
b) If the log odds of lung cancer is linearly related to smoking level, the log-odds in row I
satisfies ln i     i . Show that this implies that the local odds ratios are identical.
If the log-odds is a linear function of smoking level, i, then the log-odds ratio between successive
smoking levels is ln i 1   ln i      i  1    i    , regardless of the value of i.
Hence, the odds ratio between successive levels of smoking is
e   i 1
 i 1, i    i  e  . Thus the local odds ratios are identical.
e
It can be seen from the graph below that the above linear relationship holds, approximately.
Hence, we may conclude that the condition of local independence also holds.
Plot of Ln(Odds) v. Smoking Level
1.5
1
Ln(Odds)
0.5
0
-0.5
0
1
2
3
4
5
6
7
-1
y = 0.609 x - 2.346
R2 = 0.939
-1.5
-2
-2.5
i
c) Using these data, can you estimate the probability of lung cancer at each level of smoking?
Are the estimated odds ratios in part (a) meaningful? Explain.
There is a 1-1 correspondence between the odds and the probability:



 
. Then, from the odds in the previous table, we calculate the following
1 
1 
probabilities:
Daily Avg. No. of Cigarettes
None
<5
5 – 14
15 – 24
25 – 49
50 +
Odds of Lung Cancer
0.114754
0.426357
0.857895
1.102088
1.902597
3.166667
Probability
0.102941
0.298913
0.461756
0.524283
0.655481
0.760000
These, however, are not the probabilities of occurrence of lung cancer for given smoking levels.
The data are from an artificial, retrospective study, in which a group of lung cancer patients at
each level of smoking were matched with a group of patients without lung cancer at the same
level of smoking. Each of the probabilities then represents the probability that, when randomly
selecting a patient from the smoking level group, the patient selected will be from the lung
cancer group. Likewise, the odds ratios from part (a) relate to random selection of a patient from
each pair of smoking levels. It is not proper to interpret the probabilities as the probabilities of
having lung cancer for given smoking levels.
d) Show that the disease groups are stochastically ordered with respect to their distributions on
smoking of cigarettes (see Problem 2.34 and Section 7.3.4). Interpret. We calculate the
empirical distribution function for the lung cancer patients (1st column below) and the empirical
distribution function for the control patients (2nd column below).
Daily Avg. No. of Cigarettes
None
<5
5 – 14
15 – 24
25 – 49
50 +
Lung Cancer, Probability
0.005158
0.045689
0.406043
0.756080
0.971997
1.000000
Control, Probability
0.044952
0.140015
0.560059
0.877671
0.991157
1.000000
The two distributions are stochastically ordered, since every number in column 2 is no greater
than the corresponding number in column 3.
Download