P6104 Introduction to Biostatistical Methods Autumn 2001 Homework 2 Solutions (due Monday, September 24) 1) Rosner, problems 3.17-3.28 Let A={77-year-old man has Alzheimer’s}, B={76-year-old woman has Alzheimer’s}, C={82-year-old woman has Alzheimer’s}. We can start off with the following probabilities: P(A) .049 P(B) .023 P(C) .078 P(A) 1 - .049 .951 P( B ) 1 - .023 .977 P( C ) 1 - .078 - .922 3.17 Since the events are independent we simply multiply the 3 probabilities: P(A ∩ B ∩ C) = P(A) x P(B) x P(C) = .049 x .023 x .078 = .000088 3.18 We want the probability that one or both of the women are diseased. While these event do not seem to be dependent, we cannot assume they are mutually exclusive, so we need to use the “Addition Law of Probability for Independent Events” and subtract out their intersection: P(B U C) = P(B) + P(C) – P(B ∩ C) = .023 x .078 – (.023 x .078) = .099 Note: another algebraically equivalent way of writing this is: P(B U C) = P(B) + P(C) x [1 – P(B)] = .023 + .078(1 – .023) = .099 3.19 It is easiest to think of the probability that one or more are diseased as the complement of the probability that not one is diseased: 1 P(A U B U C) = 1 – P(A B C) = 1 – [(1 – .049) + (1 – .023) + (1 – .078)] = 1 – (.951 x .977 x .922) = 1 – .856655 = .1433 A second way of doing this is to use the extension of the addition law to more than two events (see Rosner, pg.54): P(A B C) P(A) P(B) P(C) - P(A B) - P(A C) - P( B C) P(A B C) .049 .023 .078 (.049 .023 ) (.049 .078 ) (.023 .078 ) (.049 .023 .078 ) .049 .023 .078 .00113 .00382 .00179 .00009 .1433 If you prefer(or did!) the third and lengthiest calculation, you need to add together a whole bunch of probabilities; specifically, the three probabilities representing exactly one is diseased, the three probabilities representing exactly two are diseased, and finally the probability all three are diseased: P(A B C) P(A B C ) P(A B C ) P(A B C) P(A B C ) P(A B C) P(A B C) P(A B C) (.049 .977 .922 ) (.951 .023 .922 ) (.951 .977 .078 ) (.049 .023 .922 ) (.049 .977 .078 ) (.951 .023 .078 ) (.049 .023 .078 ) .04414 .02017 .07247 .00104 .00373 .00171 .00009 .1433 3.20 The probability that exactly one is diseased is contained in the long sum above: P(A B C ) P(A B C ) P(A B C) (.049 .977 .922 ) (.951 .023 .922 ) (.951 .977 .078 ) .04414 .02017 .07247 .1368 3.21 To find the probability that a woman is diseased given exactly one is diseased, we can use the entire result in 3.20 above as our denominator (this is the probability of the conditioning event). The numerator is the probability that A (the man) is unaffected while either B or C is diseased, which is the sum of second and third probabilities in 3.20 above, so we have: 2 P(A B C ) P(A B C) .02017 .07247 .677 .1368 P(A B C ) P(A B C ) P(A B C) 3.22 We now want the probability that both are women given that two are affected: The numerator is the probability that the man is unaffected while both women are affected, and the denominator is the probability that two of three are affected (this can happen three different ways): P(A B C) P(A B C ) P(A B C) P(A B C) (.951 .023 .078 ) .00171 .2639 (.049 .023 .922 ) (.049 .977 .078 ) (.951 .023 .078 ) .00648 3.23 We now want the probability that two of three affected individuals are less than 80. They must be the man and the younger woman. The denominator is the same as above, but now the numerator is: P(A B C ) P(A B C ) P(A B C) P(A B C) (.049 .023 .922 ) .00104 .1604 (.049 .023 .922 ) (.049 .977 .078 ) (.951 .023 .078 ) .00648 NOW, we are given some additional information. We are given the joint probability that both members of a 75-79 year old married couple is .0015. Let M = {75-79 year-old man has Alzheimer’s}, W = {75-79 year-old woman has Alzheimer’s} We know P(M ∩ W) = .0015 3 3.24 We want P(M | W). This is the joint over the marginal: P(M W) .0015 .0652 P(W) .023 If we look in the table, we see the probability of a 75-79 year-old man is affected is only .049, so clearly these events are not independent: P(M | W) ≠ P(M) 3.25 Now we want P(W | M). As above, we have P(W M) .0015 .0306 P(M) .049 The conditional probability is higher than the unconditional probability, as above. 3.26 The probability at least one is affected is simply the union of the two events, since this includes either one or both being affected. We are given that the joint probability is .0015, so we use this number here: P(M U W) = P(M) + P(W) – P(M ∩ W) = .049 + .023 – .0015 = .0705 3.27 To compute the expected overall prevalence, we need to combine the information we have regarding the prevalence by age-sex group and the distribution of the population by age-sex group. For each age-sex group (there are 10 of them), we want: P(Alzheimer’s | a particular age-sex group) x P(a particular age-sex group) which gives use the probability of having Alzheimer’s and being in a particular age-sex group. (This is the “Total Probability Rule.”) Let A={Alzheimer’s}. P(A) = P(A | 65-69 y.o.male) x P(65-69 y.o.male) + P(A | 65-69 y.o.female) x P(65-69 y.o.female) + P(A | 70-74 y.o.male) x P(70-74 y.o.male) 4 + P(A | 70-74 y.o.female) x P(70-74 y.o.female) + P(A | 75-79 y.o.male) x P(75-79 y.o.male) + P(A | 75-79 y.o.female) x P(75-79 y.o.female) + P(A | 80-84 y.o.male) x P(80-84 y.o.male) x + P(A | 80-84 y.o.female) x P(80-84 y.o.female) + P(A | 85+ y.o.male) x P(85+ y.o.male) + P(A | 85+ y.o.female) x P(85+ y.o.female) = (.05 x .016) + (.10 x 0.0) + (.09 x 0.0) + (.17 x .022) + (.11 x .049) + (.18 x .023) + (.08 x .086) + (.12 x .078) + (.04 x .35) + (.06 x .279) = .061 The expected overall prevalence is 6.1% (or, 6.1 per 100 population). 3.28 We just computed the expected overall prevalence in 3.27. If there are 1000 people 65+ years old, then the expected number of cases of Alzheimer’s disease in the community is simply .061 x 1000 = 61. Rosner 3.76-3.78 3.76 If the manual measurements are regarding as the “true” measure (i.e. it can be thought of as the disease status, D), so that the automated measure is thought of as a screen (call it S), what is the sensitivity of the automated measurements? We can think of the 2 x 2 table as follows: the rows are the screen result where <10 is a negative result, while the columns are the disease status, with <10 being no disease. The entries in the table are in fact joint probabilities (i.e. 6/79 = P(S-|D), where 79 is the total number of persons in the study.) Sensitivity = P(S+ | D) = P(S+ ∩ D) / P(D) = (6/79) / (13/79) = 6 / 13 = .462 5 3.77 The specificity is: P(S- | no D) = P(S- ∩ no D) / P(no D) = (51/79) / (66/79) = 51 / 66 = .773 3.78 What is the PPV (PV+) and NPV (PV-)? PPV = P(D | S+) = P(D ∩ S+) / P(S+) = (6/79) / (21/79) = 6 / 21 = .286 PPN = P(no D | S-) = P(no D ∩ S-) / P(S-) = (51/79) / (58/79) = 51 / 58 = .879 Problem 2) A manufacturer claims its drug test will detect steroid use correctly 95% of the time. We have 15% of steroid-free athletes testing positively (i.e. incorrectly). Suppose 10% of all athletes in a certain sport use steroids. If a tested athlete tests positive, what is the probability that he uses steroids? We are given: the sensitivity of the test is .95: Sensitivity = P(S+ | D) =.95 We are also given P(S | D ) = .15. (Note: we don’t need the specificity, but we can see that since it is simply P(S | D ) it is equal to 1 – .15 = .85.) If an athlete tests positive, to determine the probability of steroid use (PPV) we have to include both the probability the test was accurate as well as it was inaccurate. PPV P(D | S ) P(S | D) P(D) P(S | D) P(D) P(S | D ) P( D ) .95 .10 .095 .413 (.95 .10 ) (.15 .90 ) .23 6