3.2 Conditional Probability and Independent Events

advertisement
Ismor Fischer, 5/29/2012
3.2
3.2-1
Conditional Probability and Independent Events
Using population-based health studies to estimate probabilities relating
potential risk factors to a particular disease, evaluate efficacy of medical
diagnostic and screening tests, etc.
Example:
Events: A = “lung cancer”
B = “smoker”
S
A
Disease Status
B
0.12
0.04
Smoker
0.03
0.81
Probabilities:
P(A) = 0.15
Lung
cancer (A)
No lung
cancer (Ac)
Yes
(B)
0.12
0.04
0.16
No
(Bc)
0.03
0.81
0.84
0.15
0.85
1.00
P(B) = 0.16
P(A ∩ B) = 0.12
Definition:
Conditional Probability of Event A,
given Event B (where P(B) ≠ 0)
P(A | B) =
=
Comments:
 P(B | A) =
P( A ∩ B)
P( B)
0.12
= 0.75 >> 0.15 = P(A).
0.16
P( B ∩ A)
0.12
=
= 0.80, so P(A | B) ≠ P(B | A) in general.
P( A)
0.15
 General formula can be rewritten: P(A ∩ B) = P(A | B) × P(B) ← IMPORTANT
Example: P(Angel barks) = 0.1
P(Brutus barks) = 0.2
P(Angel barks | Brutus barks) = 0.3
Therefore…
P(Angel and Brutus bark) = 0.06
Ismor Fischer, 5/29/2012
3.2-2
Example: Suppose that two balls are to be randomly drawn, one after another,
from a container holding four red balls and two green balls. Under the scenario
of sampling without replacement, calculate the probabilities of the events
A = “First ball is red”, B = “Second ball is red”, and A ∩ B = “First ball is red
AND second ball is red”. (As an exercise, list the 6 × 5 = 30 outcomes in the
sample space of this experiment, and use “brute force” to solve this problem.)
R1
R3
G1
R2
R4
G2
This type of problem – known as an “urn model” – can be solved with the use of
a tree diagram, where each branch of the “tree” represents a specific event,
conditioned on a preceding event. The product of the probabilities of all such
events along a particular sequence of branches is equal to the corresponding
intersection probability, via the previous formula. In this example, we obtain the
following values:
1st draw
2nd draw
P(B | A) = 3/5
P(A ∩ B) = 12/30
A
P(A) = 4/6
c
B
c
P(B | A) = 2/5
A
P(A ∩ B ) = 8/30
c
A∩B
c
A ∩B
c
P(B | A ) = 4/5
P(Ac ∩ B) = 8/30
c
P(A ) = 2/6
c
c
P(B | A ) = 1/5
P(Ac ∩ Bc) = 2/30
We can calculate the probability P(B) by adding the two “boxed” values above,
i.e., P(B) = P(A ∩ B) + P(Ac ∩ B) = 12/30 + 8/30 = 20/30, or P(B) = 2/3.
This last formula – which can be written as P(B) = P(B | A) P(A) + P(B | Ac) P(Ac) –
can be extended to more general situations, where it is known as the Law of Total
Probability, and is a useful tool in Bayes’ Theorem (next section).
Ismor Fischer, 5/29/2012
3.2-3
Suppose event C = “coffee drinker.”
Disease Status
S
A
0.09
0.06
0.34
0.51
Probabilities:
P(A) = 0.15
Therefore,
P(A | C) =
Coffee Drinker
C
Lung
cancer (A)
No lung
cancer (Ac)
Yes
(C)
0.06
0.34
0.40
No
(Cc)
0.09
0.51
0.60
0.15
0.85
1.00
P(C) = 0.40
P(A ∩ C) = 0.06
P(A ∩ C)
0.06
=
= 0.15 = P(A)
P(C)
0.40
i.e., the occurrence of event C gives no information about the probability of event A.
Definition:
Two events A and B are said to be statistically
independent if either:
(1)
P(A | B) = P(A), i.e., P(B | A) = P(B),
or equivalently,
(2)
P(A ∩ B) = P(A) × P(B).
Exercise: Prove that if events B and C are statistically independent, then so are
each of the following: B and “Not C” “Not B” and C “Not B” and “Not C”
Hint: Let P(B) = b, P(C) = c, and construct a 2 × 2 probability table.
Summary
A, B disjoint
⇔ If either event occurs, then the other cannot occur: P ( A ∩ B ) =
0.
A, B independent ⇔ If either event occurs, this gives no information about the other:
P ( A ∩ B=
) P ( A)× P ( B ) .
Example:
A = “Select a 2” and B = “Select a ♣” are not disjoint events, because
A ∩ B = {2♣} ≠ ∅. However, P(A ∩ B) = 1/52 = 1/13 × 1/4 = P(A) × P(B); hence
they are independent events. Can two disjoint events ever be independent? Why?
Ismor Fischer, 5/29/2012
3.2-4
A VERY IMPORTANT AND USEFUL FACT: It can be shown that for
any event A, all of the elementary properties of “probability” P(A) covered in
the notes, extend to “conditional probability” P (A | B ) , for any other event B.
For example, since we know that P( A1 ∪ A2 )=
P( A1 ) + P( A2 ) − P( A1 ∩ A2 )
for any two events A1 and A2, it is also true that
P( A1 ∪ A2 | B=
) P( A1 | B) + P( A2 | B) − P( A1 ∩ A2 | B) for any other event B.
As another example, since we know that P ( Ac ) = 1 − P (A ) , it therefore also
follows that P ( Ac | B ) = 1 − P (A | B ) .
Exercise: Prove these two statements. (Hint: Sketch a Venn diagram.)
HOWEVER, there is one important exception! We know that if A and B are
two independent events, then P( A ∩ B) =
P( A) P( B) . But this does not
extend to conditional probabilities! In particular, if C is any other event, then
P( A ∩ B | C ) ≠ P( A | C ) P( B | C ) in general. The following example illustrates
this, for three events A, B, and C:
B
A
.20
.20
.20
.05
.05
.05
.10
C
.15
Exercise:
Confirm that P( A ∩ B) =
P( A) P( B) , but P( A ∩ B | C ) ≠ P( A | C ) P( B | C ) .
In other words, two events that may be independent in a general population,
may not necessarily be independent in a particular subgroup of that population.
Ismor Fischer, 5/29/2012
3.2-5
More on Conditional Probability and Independent Events
Another example from epidemiology
S = POPULATION
A = lung cancer
A∩B
S = POPULATION
A = lung cancer
A∩C
C = smoker
B = obese
Suppose that, in a certain study population, we wish to investigate the prevalence of lung cancer
(A), and its associations with obesity (B) and cigarette smoking (C), respectively. From the first
of the two stylized Venn diagrams above, by comparing the scales drawn, observe that the
proportion of the size of the intersection A ∩ B (green) relative to event B (blue + green), is about
equal to the proportion of the size of event A (yellow + green) relative to the entire population S.
That is,
P( A)
P( A ∩ B)
=
.
P( S )
P( B)
(As an exercise, verify this equality for the following probabilities: yellow = .09, green = .07,
blue = .37, white = .47, to two decimals, before reading on.) In other words, the probability that a
randomly chosen person from the obese subpopulation has lung cancer, is equal to the probability
that a randomly chosen person from the general population has lung cancer (.16). This equation
can be equivalently expressed as
P(A | B) = P(A),
since the left side is conditional probability by definition, and P(S) = 1 in the denominator of
the right side. In this form, the equation clearly conveys the interpretation that knowledge of
event B (obesity) yields no information about event A (lung cancer). In this example, lung cancer
is equally probable (.16) among the obese as it is among the general population, so knowing that
a person is obese is completely unrevealing with respect to having lung cancer. Events A and B
that are related in this way are said to be independent. Note that they are not disjoint!
In the second diagram however, the relative size of A ∩ C (orange) to C (red + orange), is larger
than the relative size of A (yellow + orange) to the whole population S, so P(A | C) ≠ P(A), i.e.,
events A and C are dependent. Here, as is true in general, the probability of lung cancer is
indeed influenced by whether a person is randomly selected from among the general population
or the smoking subset, where it is much higher. Statistically, lung cancer would be a rare disease
in the U.S., if not for cigarettes (although it is on the rise among nonsmokers).
Ismor Fischer, 5/29/2012
3.2-6
Application: “Are Blood Antibodies Independent?”
An example of conditional probability in human genetics
(Adapted from Rick Chappell, Ph.D., UW Dept. of Biostatistics & Medical Informatics)
Background: The surfaces of human red blood cells (“erythrocytes”) are coated with antigens
that are classified into four disjoint blood types: O, A, B, and AB. Each type is associated
with blood serum antibodies for the other types, that is,
•
•
•
•
Type O blood contains both A and B antibodies.
(This makes Type O the “universal donor”, but capable of receiving only Type O.)
Type A blood contains only B antibodies.
Type B blood contains only A antibodies.
Type AB blood contains neither A nor B antibodies.
(This makes Type AB the “universal recipient”, but capable of donating only to Type AB.)
In addition, blood is also classified according to the presence (+) or absence (−) of Rh factor
(found predominantly in rhesus monkeys, and to varying degree in human populations; they
are important in obstetrics). Hence there are eight distinct blood groups corresponding to this
joint classification system: O+, O−, A+, A−, B+, B−, AB+, AB−. According to the American
Red Cross, the U.S. population has the following blood group relative frequencies:
Blood
Types
Rh factor
+
−
Totals
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
AB
.032
.007
.039
Totals
.833
.166
.999
From these values (and from the background information above), we can calculate the
following probabilities:
P (A antibodies) = P (Type O or B)
= P (O) + P (B)
= .461 + .111
= .572
P (B antibodies) = P (Type O or A)
= P (O) + P (A)
= .461 + .388
= .849
P (B antibodies and Rh+ ) = P (Type O+ or A+)
= P (O+) + P (A+)
= .384 + .323
= .707
Ismor Fischer, 5/29/2012
3.2-7
Using these calculations, we can answer the following.
Question: Is having “A antibodies” independent of having “B antibodies”?
Solution: We must check whether or not
P(A and B antibodies) = P(A antibodies) × P(B antibodies),
i.e.,
P(Type O)
.572
×
.849
or
.461
.486
This indicates near independence of the two events; there does exist a slight
dependence. The dependence would be much stronger if America were
composed of two disjoint (i.e., non-interbreeding) groups: Type A (with B
antibodies only) and Type B (with A antibodies only), and no Type O (with
both A and B antibodies). Since this is evidently not the case, the implication is
that either these traits evolved before humans spread out geographically, or they
evolved later but the populations became mixed in America.
Question: Is having “B antibodies” independent of “Rh+”?
Solution: We must check whether or not
P (B antibodies and Rh+) = P (B antibodies) × P (Rh+),
that is,
.707
=
.849 × .833,
which is true, so we have exact independence of these events. These traits
probably predate diversification in humans (and were not differentially selected
for since).
Exercises:
• Is having “A antibodies” independent of “Rh+”?
• Find P (A antibodies | B antibodies) and P (B antibodies | A antibodies).
Conclusions?
• Is “Blood Type” independent of “Rh factor”? (Do a separate calculation for
each blood type: O, A, B, AB, and each Rh factor: +, −.)
Download