Chapter 3. Conditional Probability and Independence Section 3.1. Conditional Probability Section 3.2 Independence Jiaping Wang Department of Mathematical Science 01/28/2013, Monday The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline Why Conditional Probability Definition of Conditional Probability Application of Conditional Probability Independence Applications of Independence The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 1. Why Conditional Probability The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example Employed Unemployed Total Less than high school diploma 11,408 1,062 12,470 High School, No College 35,944 1,890 37,834 Some College, No Degree 21,284 1,014 22,298 Associate Degree 11,693 447 12,141 Bachelor’s Degree and Higher 39,293 1,098 40,390 Total 119,622 5,511 125,133 A common summary of these data is the “unemployment rate”, which is 5511/125133 = 4.4%. However, this rate doesn’t tell us anything about the association between unemployment and education. So we are interested in finding another probability, called conditional probability. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Continue Now we compute the conditional probability based on the education for the unemployment rate. From the following table, we can find when the education level increases, the unemployment rate decreases. Employed Unemployed Less than high school diploma 91.5% 8.5% High School, No College 95.0% 5.0% Some College, No Degree 95.5% 4.5% Associate Degree 96.3% 3.7% Bachelor’s Degree and Higher 97.3 2.7% The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example 3.1 Projected percentage of workers in the labor force for 2014 are shown in table. How do the relative frequencies for the four ethnic groups compare between women and men? Men Women Total Men Women White 43% 37% 80% White 81% 79% Black 6% 6% 12% Black 11% 13% Asian 3% 3% 6% Asian 6% 6% Other 1% 1% 2% Other 2% 2% Total 53% 47% 100% Total 100% 100% If we assume the population size = n, then the white men has 43%*n/(53%*n)=43/53=81%, so similar for other relative frequencies. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Reduced Sample Space Another illustration: Consider the probability of a family with two girls, the sample space is S={(boy,boy),(boy,girl), (girl,boy), (girl, girl)}. So the P(Two girls)=1/4. Now, if we are told that the family has at least one girl, what is the probability that the family has two girl? At this time, the sample space becomes Sr={(boy, girl), (girl, boy), (girl, girl)}, so the P(two girls|at least one girl)=1/3. If based on sample space S and assume A={two girls}, B={ at least one girls}, then P(two girls | at least one girl)=P(A|B)=P(A∩B)/P(B)=1/3/3/4=1/3. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 2. Definition of Conditional Probability The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Definition 3.1 If A and B are any two events, then the conditional probability of A given B, denoted as P(A|B), is Provided that P(B)>0. Notice that P(A∩B) = P(A|B)P(B) or P(A∩B) = P(B|A)P(A). This definition also follows the three axioms of probability. (1) A∩B is a subset of B, so P(A∩B )≤P(B), then 0≤P(A|B)≤1; (2) P(S|B)=P(S∩B)/P(B)=P(B)/P(B)=1; (3) If A1, A2, …, are mutually exclusively, then so are A1∩B, A2 ∩B, …; and P(UAi|B) = P((UAi) ∩B)/P(B)=P(U(Ai ∩B)/P(B)=∑P(Ai ∩B)/P(B)= ∑P(Ai|B). The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example 3.2 There are four batteries and one is defective. Two are to be selected at random for use on a particular day. Find the probability that the second battery selected is not defective, given that the first was not defective. Solution: Let N1 denote the event that 1st battery Selected is non-defective, N2 denote that 2nd battery Selected is non-defective. Also we assume the 1st is defective. So we are interested in P(N2|N1)=P(N1∩N2)/P(N1). From the left figure, we can find P(N1)=3/4, and P(N1 ∩N2)=6/12=1/2, then P(N2|N1)=1/2*4/3=2/3. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 3. Application of Conditional Probability The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Screen Test A screen test indicates the presence or absence of a particular disease. There are two different kinds of errors: False Positive: The test indicates a person has disease when he/she actually does not; False Negative: The test indicates a person has no disease when he/she actually does have it. Sensitivity: the probability that a person selected randomly from among those who have the disease will have a positive test. Specificity: the probability that a person selected randomly from among those who do not have the disease will have a negative test. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Continue True Diagnosis + - Sum Test + a b a+b Result - c d c+d Sum a+c b+d a+b+c+d=n The + indicates the presence of the disease under study; The – indicates the absence of the disease under study. The sensitivity = a/(a+c), the specificity = d/(b+d). Predictive value is the conditional probability that a randomly selected person actually has the disease, given that he/she tested positive: predictive value=a/(a+d). A good test should have a high predictive value, but not always possible, which is affected by the prevalence value: the proportion of the population under study that actually has the disease. Prevalence value=(a+c)/n The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example 3.3 Nucleic acid amplification tests (NAATs) are generally agreed to be better than non-NAATs for diagnosing the presence of Chlamydia trachomatis, the most prevalence sexually transmitted disease. The ligase chain reaction (LCR) test is one such test. In a large study, the sensitivity and specificity of LCR for women were assessed. Following are the results: LCR Tissue Culture + - Sum 139 84 223 Results - 13 1896 1909 Sum 152 1980 2132 Test + The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example 3.3 Continue Assuming that the tissue culture is exact and that the women in the study constitute a random sample of women in the United States, answer the following questions: a. What is the prevalence of Chalmydia trachomatis? b. What is the sensitivity of LCR? c. What is the specificity of LCR? d. What is the predictive value of LCR? Solutions: a. prevalence = 152/2132 b. sensitivity = 139/252 c. specificity = 1896/1980 d. predictive value=139/223 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 4. Independence The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Definition 3.2 and Theorem 3.2 Definition 3.2: Two events A and B are said to be independent if P(A∩B)=P(A)P(B). This is equivalent to stating that P(A|B)=P(A), P(B|A)=P(B) If the conditional probability exist. Theorem 3.2: Multiplicative Rule. If A and B are any two events, then P(A∩B) = P(A)P(B|A) = P(B)P(A|B) If A and B are independent, then P(A∩B) = P(A)P(B). The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example 3.4 Suppose that a foreman must select one worker from a pool of four available workers (numbered from 1 to 4) for a special job. He selects the worker by mixing the four names and randomly selecting one. Let A denote the event that worker 1 or 2 is selected, let B denote the event that worker 1 or 3 is selected, and let C denote the event that worker 1 is selected. Are A and B independent? Are A and C independent? Solutions: S={1,2,3,4}, A={1,2}, B={1,3}, C={1}. By assumption that assigns ¼ to each individual worker, P(A)=1/2, P(B)=1/2, P(C) = ¼, P(A∩B)=1/4, so we have P(A)P(B)=1/2*1/2=1/4 = P(A ∩B), thus A and B are independent; P(A∩C)=1/4 ≠ P(A)P(C), so A and C are not independent. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 5. Applications with Independence The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Genetics Application A unit of inheritance is a gene, which transmits chemical information that is expressed as a trait such as color or size. Two genes for each trait are present in each individual, called alleles. These two allelic genes in any one individual may be likely(homozygous) or different (heterzygous). When two individuals mate, each parent contributes one of his/her genes from each allele. A simplest model, the probability of each gene from an allele being passed to the offspring is ½ and the two parents contribute alleles independently of each other. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example 3.5 Blood type, the best known of the blood factors, is determined by a single allele. Each person has blood type A, B, AB or O. Type O represents the absence of a factor and is recessive to factors A and B. Thus a person with type A blood may be either homozygous (AA) or heterozygous(AO) for this allele; similarly, a person with type B blood may be either homozygous (BB) or heterozygous (BO). Type AB occurs if a person is given an A factor by a parent and a B factor by the other parent. To have type O blood, an individual must be homozygous O (OO). Suppose a couple is preparing to have a child. One parent has blood type AB, and the other is heterozygous B. What is the possible blood types that the child will have and what is the probability of each? Solutions: there are three possible types: AB, B and A with Probabilities ¼, ½ and ¼, respectively. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Relay in Electrical Circuit In a simple probability model, we assume the relays are independent. There are two basic kinds of connections: And other structures are based on the combinations of the series and parallel. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Example 3.6 A section of an electrical circuit has two relays in parallel. The relays operate independently and when a switch is thrown, each will close properly with a probability of 0.8. If both relays are open, find the probability that the current will flow from left to right when the switch is thrown. Solutions: Let O denote Open, C denote Close. Then there are four possible outcomes: E1={(O, C)}, E2={(O,O)}, E3={(C,O)}, E4={(C,C)}. We know the P(C) = 0.8, so P(O)=0.2 for each relay. As relays operate independently, so P(E1)=P(O)P(C)=0.16, P(E2)=P(O)P(O)=0.04, P(E3)=P(C)P(O)=0.16, P(E4)=P(C)P(C)=0.64. Also when the relay opens, no current flows. So we are interested in the event, denoted by A=E1UE3UE4 and E1, E2, E3 and E4 are mutually exclusive, so P(A)=P(E1)+P(E3)+P(E4)=0.16+0.16+0.64 = 0.96. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL