ppt - Department of Mathematics

advertisement
Chapter 3. Conditional Probability and
Independence
Section 3.1. Conditional Probability
Section 3.2 Independence
Jiaping Wang
Department of Mathematical Science
01/28/2013, Monday
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
Why Conditional Probability
Definition of Conditional Probability
Application of Conditional Probability
Independence
Applications of Independence
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Part 1. Why Conditional
Probability
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example
Employed
Unemployed
Total
Less than high school diploma
11,408
1,062
12,470
High School, No College
35,944
1,890
37,834
Some College, No Degree
21,284
1,014
22,298
Associate Degree
11,693
447
12,141
Bachelor’s Degree and Higher
39,293
1,098
40,390
Total
119,622
5,511
125,133
A common summary of these data is the “unemployment rate”, which is
5511/125133 = 4.4%. However, this rate doesn’t tell us anything about the association
between unemployment and education. So we are interested in finding another
probability, called conditional probability.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Continue
Now we compute the conditional probability based on the
education for the unemployment rate. From the
following table, we can find when the education level
increases, the unemployment rate decreases.
Employed
Unemployed
Less than high school diploma
91.5%
8.5%
High School, No College
95.0%
5.0%
Some College, No Degree
95.5%
4.5%
Associate Degree
96.3%
3.7%
Bachelor’s Degree and Higher
97.3
2.7%
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example 3.1
Projected percentage of workers in the labor force for
2014 are shown in table. How do the relative
frequencies for the four ethnic groups compare
between women and men?
Men
Women Total
Men
Women
White
43%
37%
80%
White
81%
79%
Black
6%
6%
12%
Black
11%
13%
Asian
3%
3%
6%
Asian
6%
6%
Other
1%
1%
2%
Other
2%
2%
Total
53%
47%
100%
Total
100%
100%
If we assume the population size = n, then the white men has
43%*n/(53%*n)=43/53=81%, so similar for other relative frequencies.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Reduced Sample Space
Another illustration: Consider the probability of a family
with two girls, the sample space is
S={(boy,boy),(boy,girl), (girl,boy), (girl, girl)}. So the
P(Two girls)=1/4.
Now, if we are told that the family has at least one girl,
what is the probability that the family has two girl? At
this time, the sample space becomes Sr={(boy, girl),
(girl, boy), (girl, girl)}, so the P(two girls|at least one
girl)=1/3.
If based on sample space S and assume A={two girls}, B={
at least one girls}, then P(two girls | at least one
girl)=P(A|B)=P(A∩B)/P(B)=1/3/3/4=1/3.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Part 2. Definition of Conditional
Probability
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Definition 3.1
If A and B are any two events, then the conditional
probability of A given B, denoted as P(A|B), is
Provided that P(B)>0.
Notice that P(A∩B) = P(A|B)P(B) or P(A∩B) = P(B|A)P(A).
This definition also follows the three axioms of probability.
(1) A∩B is a subset of B, so P(A∩B )≤P(B), then 0≤P(A|B)≤1;
(2) P(S|B)=P(S∩B)/P(B)=P(B)/P(B)=1;
(3) If A1, A2, …, are mutually exclusively, then so are A1∩B, A2 ∩B, …; and
P(UAi|B) = P((UAi) ∩B)/P(B)=P(U(Ai ∩B)/P(B)=∑P(Ai ∩B)/P(B)= ∑P(Ai|B).
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example 3.2
There are four batteries and one is defective. Two are to be selected
at random for use on a particular day. Find the probability that the
second battery selected is not defective, given that the first was
not defective.
Solution: Let N1 denote the event that 1st battery
Selected is non-defective, N2 denote that 2nd battery
Selected is non-defective. Also we assume the 1st is
defective. So we are interested in
P(N2|N1)=P(N1∩N2)/P(N1).
From the left figure, we can find P(N1)=3/4, and
P(N1 ∩N2)=6/12=1/2, then P(N2|N1)=1/2*4/3=2/3.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Part 3. Application of
Conditional Probability
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Screen Test
A screen test indicates the presence or absence of a
particular disease.
There are two different kinds of errors:
False Positive: The test indicates a person has disease
when he/she actually does not;
False Negative: The test indicates a person has no
disease when he/she actually does have it.
Sensitivity: the probability that a person selected
randomly from among those who have the disease will
have a positive test.
Specificity: the probability that a person selected
randomly from among those who do not have the
disease will have a negative test.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Continue
True Diagnosis
+
-
Sum
Test +
a
b
a+b
Result -
c
d
c+d
Sum
a+c
b+d
a+b+c+d=n
The + indicates the presence of the disease under study;
The – indicates the absence of the disease under study.
The sensitivity = a/(a+c), the specificity = d/(b+d).
Predictive value is the conditional probability that a randomly selected
person actually has the disease, given that he/she tested positive:
predictive value=a/(a+d).
A good test should have a high predictive value, but not always possible,
which is affected by the prevalence value: the proportion of the
population under study that actually has the disease.
Prevalence value=(a+c)/n
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example 3.3
Nucleic acid amplification tests (NAATs) are generally
agreed to be better than non-NAATs for diagnosing the
presence of Chlamydia trachomatis, the most
prevalence sexually transmitted disease. The ligase
chain reaction (LCR) test is one such test. In a large
study, the sensitivity and specificity of LCR for women
were assessed. Following are the results:
LCR
Tissue Culture
+
-
Sum
139
84
223
Results -
13
1896
1909
Sum
152
1980
2132
Test
+
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example 3.3 Continue
Assuming that the tissue culture is exact and that the
women in the study constitute a random sample of
women in the United States, answer the following
questions:
a. What is the prevalence of Chalmydia trachomatis?
b. What is the sensitivity of LCR?
c. What is the specificity of LCR?
d. What is the predictive value of LCR?
Solutions: a. prevalence = 152/2132
b. sensitivity = 139/252
c. specificity = 1896/1980
d. predictive value=139/223
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Part 4. Independence
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Definition 3.2 and Theorem 3.2
Definition 3.2: Two events A and B are said to be
independent if
P(A∩B)=P(A)P(B).
This is equivalent to stating that
P(A|B)=P(A), P(B|A)=P(B)
If the conditional probability exist.
Theorem 3.2: Multiplicative Rule. If A and B are any two
events, then
P(A∩B) = P(A)P(B|A)
= P(B)P(A|B)
If A and B are independent, then
P(A∩B) = P(A)P(B).
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example 3.4
Suppose that a foreman must select one worker from a pool of four
available workers (numbered from 1 to 4) for a special job. He
selects the worker by mixing the four names and randomly
selecting one. Let A denote the event that worker 1 or 2 is selected,
let B denote the event that worker 1 or 3 is selected, and let C
denote the event that worker 1 is selected. Are A and B
independent? Are A and C independent?
Solutions: S={1,2,3,4}, A={1,2}, B={1,3}, C={1}.
By assumption that assigns ¼ to each individual worker, P(A)=1/2, P(B)=1/2,
P(C) = ¼, P(A∩B)=1/4, so we have P(A)P(B)=1/2*1/2=1/4 = P(A ∩B), thus A
and B are independent; P(A∩C)=1/4 ≠ P(A)P(C), so A and C are not
independent.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Part 5. Applications with
Independence
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Genetics Application
A unit of inheritance is a gene, which
transmits chemical information that is
expressed as a trait such as color or size.
Two genes for each trait are present in each
individual, called alleles. These two allelic
genes in any one individual may be
likely(homozygous) or different
(heterzygous). When two individuals mate,
each parent contributes one of his/her
genes from each allele.
A simplest model, the probability of each
gene from an allele being passed to the
offspring is ½ and the two parents
contribute alleles independently of each
other.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example 3.5
Blood type, the best known of the blood factors, is determined by a single allele. Each
person has blood type A, B, AB or O. Type O represents the absence of a factor and is
recessive to factors A and B. Thus a person with type A blood may be either
homozygous (AA) or heterozygous(AO) for this allele; similarly, a person with type B
blood may be either homozygous (BB) or heterozygous (BO). Type AB occurs if a
person is given an A factor by a parent and a B factor by the other parent. To have type
O blood, an individual must be homozygous O (OO). Suppose a couple is preparing to
have a child. One parent has blood type AB, and the other is heterozygous B. What is
the possible blood types that the child will have and what is the probability of each?
Solutions: there are three possible types: AB, B and A with
Probabilities ¼, ½ and ¼, respectively.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Relay in Electrical Circuit
In a simple probability model, we assume the relays are independent. There
are two basic kinds of connections:
And other structures are based on the combinations of the series and parallel.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Example 3.6
A section of an electrical circuit has two relays in parallel. The relays operate
independently and when a switch is thrown, each will close properly with a
probability of 0.8. If both relays are open, find the probability that the current
will flow from left to right when the switch is thrown.
Solutions: Let O denote Open, C denote Close. Then there are four possible
outcomes: E1={(O, C)}, E2={(O,O)}, E3={(C,O)}, E4={(C,C)}. We know the P(C) =
0.8, so P(O)=0.2 for each relay. As relays operate independently, so
P(E1)=P(O)P(C)=0.16, P(E2)=P(O)P(O)=0.04, P(E3)=P(C)P(O)=0.16,
P(E4)=P(C)P(C)=0.64. Also when the relay opens, no current flows. So we are
interested in the event, denoted by A=E1UE3UE4 and E1, E2, E3 and E4 are
mutually exclusive, so P(A)=P(E1)+P(E3)+P(E4)=0.16+0.16+0.64 = 0.96.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Download