P6104 Introduction to Biostatistical Methods

advertisement
P6104 Introduction to Biostatistical Methods
Autumn 2001
Homework 2 Solutions (due Monday, September 24)
1) Rosner, problems 3.17-3.28
Let A={77-year-old man has Alzheimer’s}, B={76-year-old woman has
Alzheimer’s}, C={82-year-old woman has Alzheimer’s}.
We can start off with the following probabilities:
P(A)  .049
P(B)  .023
P(C)  .078
P(A)  1 - .049  .951
P( B )  1 - .023  .977
P( C )  1 - .078 - .922
3.17
Since the events are independent we simply multiply the 3 probabilities:
P(A ∩ B ∩ C) = P(A) x P(B) x P(C) = .049 x .023 x .078 = .000088
3.18
We want the probability that one or both of the women are diseased. While
these event do not seem to be dependent, we cannot assume they are mutually
exclusive, so we need to use the “Addition Law of Probability for Independent
Events” and subtract out their intersection:
P(B U C) = P(B) + P(C) – P(B ∩ C) = .023 x .078 – (.023 x .078) = .099
Note: another algebraically equivalent way of writing this is:
P(B U C) = P(B) + P(C) x [1 – P(B)] = .023 + .078(1 – .023) = .099
3.19
It is easiest to think of the probability that one or more are diseased as the
complement of the probability that not one is diseased:
1
P(A U B U C) = 1 – P(A  B  C) = 1 – [(1 – .049) + (1 – .023) + (1 – .078)]
= 1 – (.951 x .977 x .922) = 1 – .856655 = .1433
A second way of doing this is to use the extension of the addition law to more than
two events (see Rosner, pg.54):
P(A  B  C)  P(A)  P(B)  P(C) - P(A  B) - P(A  C) - P( B  C)  P(A  B  C)
 .049  .023  .078  (.049  .023 )  (.049  .078 )  (.023  .078 )  (.049  .023  .078 )
 .049  .023  .078  .00113  .00382  .00179  .00009  .1433
If you prefer(or did!) the third and lengthiest calculation, you need to add together
a whole bunch of probabilities; specifically, the three probabilities representing
exactly one is diseased, the three probabilities representing exactly two are
diseased, and finally the probability all three are diseased:
P(A  B  C)  P(A  B  C )  P(A  B  C )  P(A  B  C)  P(A  B  C )  P(A  B  C) 
P(A  B  C)  P(A  B  C)
 (.049  .977  .922 )  (.951  .023  .922 )  (.951  .977  .078 )
 (.049  .023  .922 )  (.049  .977  .078 )  (.951  .023  .078 )  (.049  .023  .078 )
 .04414  .02017  .07247  .00104  .00373  .00171  .00009  .1433
3.20
The probability that exactly one is diseased is contained in the long sum above:
P(A  B  C )  P(A  B  C )  P(A  B  C)
 (.049  .977  .922 )  (.951  .023  .922 )  (.951  .977  .078 )
 .04414  .02017  .07247  .1368
3.21
To find the probability that a woman is diseased given exactly one is diseased,
we can use the entire result in 3.20 above as our denominator (this is the
probability of the conditioning event). The numerator is the probability that A (the
man) is unaffected while either B or C is diseased, which is the sum of second
and third probabilities in 3.20 above, so we have:
2
P(A  B  C )  P(A  B  C)
.02017  .07247

 .677
.1368
P(A  B  C )  P(A  B  C )  P(A  B  C)
3.22
We now want the probability that both are women given that two are affected:
The numerator is the probability that the man is unaffected while both women are
affected, and the denominator is the probability that two of three are affected (this
can happen three different ways):
P(A  B  C)
P(A  B  C )  P(A  B  C)  P(A  B  C)
(.951  .023  .078 )
.00171


 .2639
(.049  .023  .922 )  (.049  .977  .078 )  (.951  .023  .078 ) .00648
3.23
We now want the probability that two of three affected individuals are less than
80. They must be the man and the younger woman.
The denominator is the same as above, but now the numerator is:
P(A  B  C )
P(A  B  C )  P(A  B  C)  P(A  B  C)
(.049  .023  .922 )
.00104


 .1604
(.049  .023  .922 )  (.049  .977  .078 )  (.951  .023  .078 ) .00648
NOW, we are given some additional information. We are given the joint
probability that both members of a 75-79 year old married couple is .0015.
Let M = {75-79 year-old man has Alzheimer’s},
W = {75-79 year-old woman has Alzheimer’s}
We know P(M ∩ W) = .0015
3
3.24
We want P(M | W). This is the joint over the marginal:
P(M  W) .0015

 .0652
P(W)
.023
If we look in the table, we see the probability of a 75-79 year-old man is affected
is only .049, so clearly these events are not independent: P(M | W) ≠ P(M)
3.25
Now we want P(W | M). As above, we have
P(W  M) .0015

 .0306
P(M)
.049
The conditional probability is higher than the unconditional probability, as above.
3.26
The probability at least one is affected is simply the union of the two events,
since this includes either one or both being affected. We are given that the joint
probability is .0015, so we use this number here:
P(M U W) = P(M) + P(W) – P(M ∩ W) = .049 + .023 – .0015 = .0705
3.27
To compute the expected overall prevalence, we need to combine the
information we have regarding the prevalence by age-sex group and the
distribution of the population by age-sex group.
For each age-sex group (there are 10 of them), we want:
P(Alzheimer’s | a particular age-sex group) x P(a particular age-sex group)
which gives use the probability of having Alzheimer’s and being in a particular
age-sex group. (This is the “Total Probability Rule.”)
Let A={Alzheimer’s}.
P(A) = P(A | 65-69 y.o.male) x P(65-69 y.o.male)
+ P(A | 65-69 y.o.female) x P(65-69 y.o.female)
+ P(A | 70-74 y.o.male) x P(70-74 y.o.male)
4
+ P(A | 70-74 y.o.female) x P(70-74 y.o.female)
+ P(A | 75-79 y.o.male) x P(75-79 y.o.male)
+ P(A | 75-79 y.o.female) x P(75-79 y.o.female)
+ P(A | 80-84 y.o.male) x P(80-84 y.o.male) x
+ P(A | 80-84 y.o.female) x P(80-84 y.o.female)
+ P(A | 85+ y.o.male) x P(85+ y.o.male)
+ P(A | 85+ y.o.female) x P(85+ y.o.female)
= (.05 x .016) + (.10 x 0.0) + (.09 x 0.0) + (.17 x .022) + (.11 x .049) + (.18 x .023)
+ (.08 x .086) + (.12 x .078) + (.04 x .35) + (.06 x .279) = .061
The expected overall prevalence is 6.1% (or, 6.1 per 100 population).
3.28
We just computed the expected overall prevalence in 3.27. If there are 1000
people 65+ years old, then the expected number of cases of Alzheimer’s disease
in the community is simply .061 x 1000 = 61.
Rosner 3.76-3.78
3.76 If the manual measurements are regarding as the “true” measure (i.e. it can be
thought of as the disease status, D), so that the automated measure is thought of
as a screen (call it S), what is the sensitivity of the automated measurements?
We can think of the 2 x 2 table as follows: the rows are the screen result where
<10 is a negative result, while the columns are the disease status, with <10 being
no disease. The entries in the table are in fact joint probabilities
(i.e. 6/79 = P(S-|D), where 79 is the total number of persons in the study.)
Sensitivity = P(S+ | D) = P(S+ ∩ D) / P(D) = (6/79) / (13/79) = 6 / 13 = .462
5
3.77 The specificity is:
P(S- | no D) = P(S- ∩ no D) / P(no D) = (51/79) / (66/79) = 51 / 66 = .773
3.78 What is the PPV (PV+) and NPV (PV-)?
PPV = P(D | S+) = P(D ∩ S+) / P(S+) = (6/79) / (21/79) = 6 / 21 = .286
PPN = P(no D | S-) = P(no D ∩ S-) / P(S-) = (51/79) / (58/79) = 51 / 58 = .879
Problem 2) A manufacturer claims its drug test will detect steroid use correctly 95% of
the time. We have 15% of steroid-free athletes testing positively (i.e. incorrectly).
Suppose 10% of all athletes in a certain sport use steroids. If a tested athlete tests
positive, what is the probability that he uses steroids?
We are given: the sensitivity of the test is .95:
Sensitivity = P(S+ | D) =.95
We are also given P(S  | D ) = .15.
(Note: we don’t need the specificity, but we can see that since it is simply P(S  | D ) it is
equal to 1 – .15 = .85.)
If an athlete tests positive, to determine the probability of steroid use (PPV) we have to
include both the probability the test was accurate as well as it was inaccurate.
PPV  P(D | S  ) 
P(S  | D)  P(D)


P(S | D)  P(D)  P(S | D )  P( D )

.95  .10
.095

 .413
(.95  .10 )  (.15  .90 ) .23
6
Download