Chapter 3 Probability An experiment/ random phenomenon is an act or process of observation where the outcome is uncertain. eg Play a game with an opponent, spin a coin, roll a die, run a red light, measure a person’s height The sample space of an experiment is the collection of all possible outcomes. Sample space is denoted S eg expt = roll a six-sided die once. Then S = { 1, 2, 3, 4, 5, 6 } An event is a collection of outcomes. Events are often denoted by letters near the beginning of the alphabet eg expt = roll a six-sided die once. Let E be the event of rolling an even number. Let F be the event of rolling a number that’s at least 5. Define E and F. Soln: E = { 2, 4, 6 } F={ 5, 6 } The union of two events A and B, written as A or B, is the collection of all outcomes that are in A or B or both A and B -- that is, outcomes that make A occur, or B occur, or both A and B occur. Union is denoted ∪ (or v) eg E ∪ F = an even number or a number that’s at least 5 = { 2, 4, 6, 5 } The intersection of two events A and B, written as A and B, is the collection of all outcomes that are in both A and B—that is, outcomes that make both A and B occur. Intersection is denoted ∩ 1 E and F contains the outcomes that are in both E and F. eg E ∩ F = an even number that’s at least 5 = { 6 } A Venn diagram is a visual representation of the sample space and events of interest for an experiment. eg Draw a Venn diagram for the die-rolling experiment. Show E, F, E ∪ F and E∩F. The complement of an event A consists of the outcomes of the experiment that are not in A. Denoted π΄π . A’ or π΄Μ eg If E = even number. Then πΈ π = odd number = { 1, 3, 5 } Note: The complement of the sample space S is the empty set, denoted ∅. ∅ = { } = π π Two events are called mutually exclusive if they cannot happen at the same time –that is, if their intersection is empty. Eg E = even number, O = odd number. Then E∩O = ∅ so E and O are mutually exclusive. 2 The probability of an event is the likelihood that it will occur. The probability of an event E occurring is denoted P(E)=P(even number) = .5 Eg Roll a fair six-sided die once. E = even number. Fair tells us that the 6 sides are equally likely. P(E) = .5 How do we know? P(E) = 3/6 Example: Brothers and sisters 28 respondents 18 people have one or more brothers (B) 17 people have one or more sisters (S) 3 people have no brothers and no sisters (Not B and Not S) One or more brothers No brothers Total One or more sisters No sisters Total 10 7 17 8 3 11 18 10 28 How is this table set up? First decide on the two basic questions (Brother status and sister status) The individuals are people in the room now. They are classified according to two variables: ο· ο· Brother status Sister status Suppose that one person is randomly selected from those in the “room” now. randomly selected tells us that everyone has the same chance Find: 18 P(B)= 28 17 P(S) = 28 10 π(B ∩ S) = P(B and S) = 28 P(B ∪ S) = P(B or S) = 10+8+7 28 25 = 28 3 One or more brothers No brothers Total One or more sisters No sisters Total 10 7 17 8 3 11 18 10 28 8 P(B ∩ S c ) = P(B and not S) = 28 P(B ∪ S c ) = P(B or Not S) = 10+8+3 28 21 = 28 3 P(Bc ∩ S c ) = P(Not B and Not S) = 28 P(Bc ∪ S c ) = P(Not B or Not S) = 7+3+8 28 18 = 28 Note: The Addition Rule is π(π΄ ∪ π΅) = π(π΄) + π(π΅) − π(π΄ ∩ π΅) Check that the addition rule can be used to find P(B ∪ S) in the brothers and sisters example. π(π ∪ π΅) = π(π) + π(π΅) − π(π ∩ π΅) = 17 18 10 25 + − = 28 28 28 28 Note: Probabilities are calculated from two numbers. Bottom number (denominator): What’s it out of? Top number (numerator): How many (out of the bottom number) are there? Exercise 3.64 p 143 Employee behaviour problems A survey of human resource officers (HRO’s) at big companies was described in The Organisational Journal (Summer 2006). 55% of the human resource officers mentioned problems with absenteeism amongst employees, 41% had problems with turnover of employees and 22% had problems with both absenteeism and turnover. (a) What’s the probability that a randomly chosen HRO from this group had problems with absenteeism or turnover? (b) What’s the probability that a randomly chosen HRO from this group did not have problems with absenteeism? (c) What’s the probability that a randomly chosen HRO from this group had problems with neither absenteeism nor turnover? Solution: Classify the human resource officers according to 2 variables: (What are they?) 4 Problems with turnover (T) Problems with absenteeism (A) 55% of the human resource officers mentioned problems with absenteeism amongst employees, 41% had problems with turnover of employees and 22% had problems with both absenteeism and turnover. T Not T Total Get rid of the % signs T Not T Total A 22% 33% 55% Not A 19% 26% 45% Total 41% 59% 100% A 22 33 55 Not A 19 26 45 Total 41 59 100 Now answer the questions (a) What’s the probability that a randomly chosen HRO from this group had problems with absenteeism or turnover? P(A or T) = 22+19+33 100 74 = 100 (b) What’s the probability that a randomly chosen HRO from this group did not have problems with absenteeism? P(not A) = 45 100 (c) What’s the probability that a randomly chosen HRO from this group had problems with neither absenteeism nor turnover? 26 P(not A and Not T) = 100 (d) What is the probability that a randomly selected human resource officer from this group has problems with at least one of the two problems (absenteeism and turnover)? P(A or T) = 22+19+33 100 74 = 100 5 (e) What is the probability that a randomly selected human resource officer from this group has problems with only one of the two problems (absenteeism and turnover)? P(A and Not T) +P( T and Not A)= 19+33 100 52 = 100 Exercise 3.52 p141 Social networking. The Pew Research Internet Group Project (Dec 2013) conducted a survey of adult internet users in the US. The survey found that the two most popular social network sites in the US (amongst adult users of the Internet) were LinkedIn and Facebook: 71% had a Facebook account, 22% had a LinkedIn account and 18% used both LinkedIn and Facebook. a) Make an appropriate Venn Diagram. b) What’s the chance that an adult internet user used at least one of LinkedIn and Facebook? c) What’s the chance that an adult internet user uses exactly one of the two sites LinkedIn and Facebook? Solution: (a) Draw a Venn Diagram using Paint. (b) Use the diagram in (a). (c) Use the diagram in (a). Table method: 6 The Pew Research Internet Group Project (Dec 2013) conducted a survey of adult internet users in the US. The survey found that the two most popular social network sites in the US (amongst adult users of the Internet) were LinkedIn and Facebook: 71% had a Facebook account, 22% had a LinkedIn account and 18% used both LinkedIn and Facebook. What/who is being classified? Adult internet users in the US What are they being classified according to? (ie what are the two variables?) Facebook use (F) LinkedIn use (L) F 18 53 71 L Not L Total Not F 4 25 29 Total 22 78 100 Now use the table to do (b) and (c). (b) What’s the chance that an adult internet user used at least one of LinkedIn and Facebook? P(L or F) = (18+4+53)/100= 75/100 (c) What’s the chance that an adult internet user uses exactly one of the two sites LinkedIn and Facebook? P(exactly one of F and L) = (4+53)/100= 57/100 Conditional Probability (not out of all), conditional on some other event happening, incorporate other info. Brothers and sisters One or more sisters No sisters Total One or more brothers 10 8 18 No brothers 7 3 10 Total 17 11 28 Suppose we select one person randomly from the 28 in the room now. We find that this person has a brother. What’s the probability that (s)he has a sister? What is this out of? 10 P(S|B) = 18 Answer * 7 Suppose we select one person randomly from the 28 in the room now. We find that this person has a sister. What’s the probability that (s)he has a brother? 10 P(B|S)= 17 Note: The conditional probability of A occurring, given that B has occurred, is denoted P(A/B) or P(A|B). Useful formula: π(π΄|π΅) = π(π΄ ∩ π΅) π(π΅) Find P(Sister|Brother) using this formula. Suppose we select one person randomly from the 28 in the room now. We find that this person has a brother. What’s the probability that (s)he has a sister? 18 From above: P(B) = 28 10 π(B ∩ S) = P(B and S) = 28 We want P(sister | brother) π(π|π΅) = π(π ∩ π΅) π(π΅) = 10/28 18/28 = 10/18 Compare with answer * above. Note that P(S|B) is different from P(B|S)!!! Exercise 3.82 p 155 Social robots An article in The International Conference on Social Robots (Vol 6414, 2010) summarised aspects of the design of social robots as follows: a random sample of 106 social robots had 63 that were built with legs only, 20 that were built with only wheels, and 15 that were built without legs or wheels. If a social robot has wheels, what’s the probability that it also has legs? Solution: Table method (use a table): What is being classified? Social robots What questions about them are of interest? Do it have legs? Does it have wheels? 8 a random sample of 106 social robots had 63 that were built with legs only, 20 that were built with only wheels, and 15 that were built without legs or wheels. W Not W L 8 63 legs and not wheels Not L 20 wheels and not legs 15 Total 28 78 If a social robot has wheels, what’s the probability that it also has legs? Total 71 35 106 What is this out of? (denominator) It’s out of the robots that have wheels. There are 28 of them! P( L | W )= 8 28 Formula (use formulae): a random sample of 106 social robots had 63 that were built with legs only, 20 that were built with only wheels, and 15 that were built without legs or wheels. If a social robot has wheels, what’s the probability that it also has legs? Notation: L= randomly chosen robot has legs W = randomly chosen robot has wheels What probabilities have been given? P(L and not W ) =63/106 P(W and Not L) =20/106 P(Not L and Not W) 15/106 What probability is wanted? P(L|W) 9 Exercise 3.160 p176 Ancient pottery. Archeologists described (in Chance (Fall 2000)) 837 pieces of pottery found at the ancient Greek settlement of Phylakopi. 183 pieces of pottery were painted. Of these 183 painted pieces of pottery, 14 were painted in a curvi-linear pattern, 165 were painted in a geometric design and 4 were painted in a naturalistic way. a) Find the probability that a randomly selected piece of pottery is painted. b) Given that a randomly selected piece of pottery is painted, what is the probability that it is painted in a curvi-linear pattern? Solution: What is being classified? Not painted 837-183=654 (a) P(painted) Painted 183 Curvilinear 14 = Total Geometric 165 Naturalistic 4 183 837 (b) P( curvi-linear pattern | painted ) = 10 14 183 837 Note: The so-called Multiplication Rule is π(π΄ ∩ π΅) = π(π΄/π΅)π(π΅) This is the conditional prob formula with the quantities re-arranged. Exercise 3.166 (b) p176 Library card. A Harris poll found that 68% of all American adults have a public library card. Also, 62% of men have library cards while 73% of women have library cards. Assuming that there are equal numbers of men and women in the US, what is the probability that a randomly chosen American adult is a woman who owns a library card? Solution: Table method: What is being classified? American adults What variables are we using to classify Am adults? Gender ( M and F) Library card status (L = yes, Not L = no library card) Note that 62% and 73% are conditional!! (not out of all) Conditional probs do NOT go into the table! M F Total L 62% of 50=31 73% of 50=36.5 67.5 Not L 50-31=19 50-36.5=13.5 32.5 Total 50 50 100 Not L 190 135 325 Total 500 500 1000 In case we don’t like half people, multiply by 10: M F Total L 310 365 675 68% is redundant and mis-leading. They rounded 67.5% to 68%. βΉ What is the probability that a randomly chosen American adult is a woman who owns a library card? P(woman who owns a library card) =P(W and L) = 11 365 1000 Formulae: A Harris poll found that 68% of all American adults have a public library card. Also, 62% of men have library cards while 73% of women have library cards. Assuming that there are equal numbers of men and women in the US What has been given (must use correct notation)? We’ll ignore the 68% (redundant and misleading) L = randomly chosen Am adult has a library card M= randomly chosen Am adult is male F= randomly chosen Am adult is female M|L or L|M?? P(L|M) =.62 P(L|F) =.73 P(M) =.5 P(F) =.5 what is the probability that a randomly chosen American adult is a woman who owns a library card? What is wanted? P(F and L). We see that we can use the multiplication formula because we already know conditional and unconditional probabilities. P(F and L) = P(L|F)P(F)= .73*.5= .365 same as before. Exercise 3.81 p155 Blood diamonds Global Research News (Mar 4, 2014) reported that one quarter of all rough diamonds produced in the world are blood diamonds (ie produced in a war zone in order to finance war-lords’ activities, or insurgencies or invading army). 90% of all rough diamonds are processed in Surat, India. One-third of the diamonds processed in Surat are blood diamonds. (a) What’s the probability that a rough diamond is not a blood diamond? (b) What’s the probability that a rough diamond is processed in Surat and is a blood diamond? Answer the questions given using probability rules. Then make a table. How do you know that at least one of the numbers provided is incorrect? Solution using probability rules: What has been given? 12 one quarter of all rough diamonds produced in the world are blood diamonds (ie produced in a war zone in order to finance war-lords’ activities, or insurgencies or invading army). 90% of all rough diamonds are processed in Surat, India. One-third of the diamonds processed in Surat are blood diamonds. B = blood diamond P(B) = .25 S = diamond comes from Surat P(S)=.9 P(B/S) = 1/3 What is requested? (a) What’s the probability that a rough diamond is not a blood diamond? P(Not B) = 1-.25=.75 (b) What’s the probability that a rough diamond is processed in Surat and is a blood diamond? Multiplication Rule is π(π΄ ∩ π΅) = π(π΄/π΅)π(π΅) P(S and B) = P(B|S)P(S)= (1/3)* (.9) = .3 Solution using table: one quarter of all rough diamonds produced in the world are blood diamonds (ie produced in a war zone in order to finance war-lords’ activities, or insurgencies or invading army). 90% of all rough diamonds are processed in Surat, India. One-third of the diamonds processed in Surat are blood diamonds. What is being classified? diamonds How are they being classified? Is it a blood diamond? Does it come from Surat? B Not B Total S 1/3 of 90%=30% 60% 90% Not S Box 2 Box 5 10% Total 25% 75% 100% Problem: Line 1 does not add up properly. So at least one of the numbers is wrong! 13 Next: Independence Events A and B are independent if π(π΄|π΅) = π(π΄) P(A|B)= the part of B occupied by A P(A) = the part of S occupied by A We can’t see independence in a Venn Diagram. If A and B are independent, then they must overlap. A and B are independent if the chance of A happening is not affected by B happening—that is, the chance of A happening stays the same even when B has happened. Eg Roll a fair six-sided die several times. P(six on the first roll) = 1/6 P(six on 2nd roll/ six on first roll) = 1/6 Successive rolls are independent. Independence vs mutually exclusive: Are they the same? NO. Mutually exclusive means no overlap. Independence means the probability of one event doesn’t change when you know that the other has happened. Example: Brothers and sisters. Are B and S independent? B= One or more brothers No brothers Total S=One or more sisters 10 No sisters 8 18 7 17 3 11 10 28 Events S and B are independent if π(π|π΅) = π(π) Or Events S and B are independent if π(π΅|π) = π(π΅) Let’s do the first one: P(S|B) =10/18=.555555…. 14 Total P(S) = 17/28=.607….. We see that these two numbers are different, so S and B are not independent (ie they are dependent) Example of independence: Sister(s) Brother(s) 9 No brothers 3 Total 12 Are B and S independent? P(B) = P(B|S) No Sisters 6 2 8 Total 15 5 20 P(B)=15/20=.75 P( B | S )= 9 / 12 =.75 The part of all 20 taken up by B is the same as the part of S taken up by B so B and S are independent. OR P(S) =12/20 =.6 P( S | B ) = 9 /15 = .6 Exercise 3.86 p 155 Guilty decision-making (Note: Choose stated option = repair the car.) The Journal of Behavioural Decision Making (Jan 2007) describes an experiment to investigate how feeling guilty affects a person who has to make decisions. 171 students who volunteered for the experiment were randomly divided into three groups. Each group had to complete a reading or writing task which induced a particular emotional state: Guilt, Anger or Neutral. As soon as the task had been completed, the subjects were presented with a decision-making problem in which the both options have predominantly negative features-- for example, either spend money on an old car or do nothing. Notice that this is an experiment because they were randomly divided into three groups. 15 (Note: Choose stated option = repair the car.) a) Assuming that the respondent has been assigned to the guilty state, find the probability that he/she chooses to repair the car. P( Repair |Guilty ) = 45 57 b) Assuming that the respondent does not choose to repair the car, find the probability that he/she is in the anger state. P( Anger | not repair the car )= 50 111 c) Are the events repair the car and guilty state independent? Events A and B are independent if π(π πππππ|πΊπ’πππ‘π¦) = π(π πππππ) P( Repair |Guilty ) = 45 57 = .82456…. P(Repair)=60/171 = .35087…. So the events repair the car and guilty state are dependent (ie Not independent) Exercise 3.93 p157 Red snapper (Using independence, trees) Restaurants sometimes serve less expensive fish in place of expensive fish like red snapper. According to Nature (July 15, 2004), researchers at the University of North Carolina used DNA analysis to decide whether fish sold as red snapper by vendors across the US was actually red snapper. They found that 77% of the fish was not actually red snapper but some other cheaper fish which looked like red snapper. a) Find the probability of being served red snapper if you order red snapper at a restaurant. b) Find the probability that at least one out of five randomly selected restaurant customers who ordered red snapper will be served red snapper. (a) P(red snapper) = 1- 0.77 =.23 Also: Find the probability that at least one out of two customers is served snapper. Make a tree using Paint. 16 Bayes’ Theorem. Suppose that π΅1 , π΅2 , … π΅π partition the sample space. A is any event in the sample space. Then π(π΅π /π΄) = π(π΄/π΅π )π(π΅π ) π(π΄/π΅1 )π(π΅1 ) + π(π΄/π΅2 )π(π΅2 ) + … π(π΄/π΅π )π(π΅π ) 17 Example T-shirt factory: Betty makes 30% of the T-shirts; Tom makes 20% and Jane makes the rest (50%). 2% of Betty’s T-shirts are defective; 3% of Tom’s T-shirts are defective and 5% of Jane’s T-shirts are defective. Choose one T-shirt at random from the day’s production (a) (b) (c) (d) Find the probability that it’s a defective T-shirt made by Betty. Find the probability that it’s a defective T-shirt made by Jane. Find the probability that it’s a defective T-shirt. Suppose you find that the chosen T-shirt is defective. What’s the chance that Betty made it? Method I: Table Betty makes 30% of the T-shirts; Tom makes 20% and Jane makes the rest (50%). 2% of Betty’s T-shirts are defective; 3% of Tom’s T-shirts are defective and 5% of Jane’s T-shirts are defective. Choose one T-shirt at random from the day’s production. What is being classified? T-shirts What questions do we need to ask ourselves in order to classify the T-shirts? Who made it? Is it defective? Betty makes 30% of the T-shirts; Tom makes 20% and Jane makes the rest (50%). 2% of Betty’s T-shirts are defective; 3% of Tom’s T-shirts are defective and 5% of Jane’s T-shirts are defective. Choose one T-shirt at random from the day’s production. Suppose we have 1000 T-shirts Def Not Def Total Betty 2% of 300=6 294 300 Tom 3% of 200=6 194 200 18 Jane 5% of 500 =25 475 500 Total 37 963 1000 (a) Find the probability that it’s a defective t-shirt made by Betty. Ans: P(Def made by Betty) = (b) Find the probability that it’s a defective t-shirt made by Jane. Ans: P(Def made by Jane) = (c) 25 1000 Find the probability that it’s a defective t-shirt. Ans: P(Def) = (d) 6 1000 37 1000 Suppose you find that the chosen t-shirt is defective. What’s the chance that Betty made it? Ans: P( Betty | Def ) = 6 37 Method II: Tree. Use Paint 19 Method III: formulae Betty makes 30% of the T-shirts; Tom makes 20% and Jane makes the rest (50%). 2% of Betty’s T-shirts are defective; 3% of Tom’s T-shirts are defective and 5% of Jane’s T-shirts are defective. Choose one T-shirt at random from the day’s production. Given: P(B) =.3 P(T) = .2 P(J) = .5 P(D|B) = .02 P(D|T)= .03 P(D|J) = .05 (a) Find the probability that it’s a defective T-shirt made by Betty. P(D and B) = P(D|B)P(B)=.02*.3=.006 Mult Rule (b) Find the probability that it’s a defective T-shirt made by Jane. P(D and J) = P(D|J)P(B)=.05*.5=.025 Mult Rule (c) Find the probability that it’s a defective T-shirt. Also P(D and T) =.2*.03=.006 P(D) = P(D and B)+P(D and T)+P(D and J) = .006+.006+.025= .037 Law of Total Prob (d) Suppose you find that the chosen T-shirt is defective. What’s the chance that Betty made it? P(B|D) = P(B and D)/ P(D) = .006/.037 Or Bayes’ Rule: 20 P(B) =.3 P(T) = .2 P(J) = .5 P(D|B) = .02 P(D|T)= .03 P(D|J) = .05 π(π΅/π·) = π(π·/π΅)π(π΅) = π(π·/π΅)π(π΅) + π(π·/π)π(π) + π(π·/π½)π(π½) .02∗.3 = .02∗.3+.03∗.2+.05∗.5 = .006/.037 as before. Exercise 3.191 p181 Pregnancy tests Suppose that 75% of all women who take a pregnancy test really is pregnant. There’s a 2% chance that a particular pregnancy test gives a false positive result and a 99% chance that it gives a true positive result. Janet has received a positive result. What’s the chance that she really is pregnant? Solution: Table: Tree: 21 Exercise 3.139 p172 Athlete doping Biostatisticians at the University of Texas demonstrated the use of Bayes’ Rule in the context of testosterone abuse amongst Olympic athletes, reported in Chance (Spring 2004): Suppose that 100 out of 1000 athletes are using testosterone illegally. Suppose that 50 of the users would test positive for testosterone (ie 50 are true positives) and 9 of the non-users would also test positive (ie 9 are false positives). a) Find the sensitivity of this drug test—that is, Given that the athlete is a user, find the probability of a positive result. b) Find the specificity of this drug test – that is, Given that the athlete is a non-user, find the probability of a negative result. c) Find the positive predictive value of this drug test – that is, find the probability that an athlete really is using testosterone illegally, given that he/she has received a positive test result. Solution: Which method? (Table or tree or formulae?) Table is simplest What are we classifying? Athletes What questions will we ask about each athlete? Does the athlete use testosterone? Use not use What is the test result? Pos or neg Suppose that 100 out of 1000 athletes are using testosterone illegally. Suppose that 50 of the users would test positive for testosterone (ie 50 are true positives) and 9 of the non-users would also test positive (ie 9 are false positives). Use testosterone Don’t use Total Pos result 50 9 59 Neg result 50 891 941 Total 100 900 1000 a) Find the sensitivity of this drug test—that is, Given that the athlete is a user, find the probability of a positive result. P( Pos| user) = 50/ 100 =.5 Not very sensitive because it picks up only half of them! b) Find the specificity of this drug test – that is, Given that the athlete is a non-user, find the probability of a negative result. 22 P(neg result| non-user) = 891 /900 Use testosterone Don’t use Total Pos result 50 9 59 Neg result 50 891 941 Total 100 900 1000 c) Find the positive predictive value of this drug test – that is, find the probability that an athlete really is using testosterone illegally, given that he/she has received a positive test result. P( Using |Pos result )= 50/59 Review Exercises Chapter 3 3.9, 3.11, 3.23, 3.47, 3.53, 3.55, 3.57, 3.59, 3.61, 3.71, 3.73, 3.79, 3.83, 3.85, 3.87, 3.91, 3.95, 3.97, 3.101, 3.141, 3.145, 3.163, 3.179, 3.181, 3.183, 3.189 23