Unit 6: Probability Math? Ugh! Why bother? • You hear on TV a gubernatorial candidate has a 5% lead over her opponent. Should you believe she’ll win? • You’ve got sample data. How far might your average (or whatever) be off from the population average? • You’ve got experimental data. It doesn’t seem to match the prevailing theory. How likely is it that you’ve found something new? Probability • Start with finite probability (“frequency theory”), to understand rules – finite number of possible results in “sample space”, usually equally likely • Move to continuous probability, to include “normal curve”, etc. – sample space is all numbers [maybe in some interval] Finite probabilities • “Event”: some set of possible “outcomes”, i.e., values in the “sample space” • Probability of an event (with equally likely outcomes): # of outcomes in the event (called “successful outcomes”) / # of all possible outcomes (expressed as fraction or %) – Ex: Roll a die. P(getting < 3) = 2/6 = 1/3. • Idea: P(A) = fraction of times A would occur if experiment is repeated many times • Equal likelihood of outcomes is important – Flip 2 quarters: TT or HH twice as likely as HT? – Similarly, roll 2 dice: 36 equally likely outcomes, and 4,4 seems only half as likely as 3,5 (unless dice have different colors) Immediate results • P(Ac) = 1 – P(A) (Ac is the set of all outcomes not in A, the “complement” of A) – Ex: Roll one die. P(a four) = 1/6, so P(not a four) = 1 – 1/6 = 5/6 – Ex: Flip a coin 5 times. P(no heads) = 1/32, so P(at least one head) = 31/32. • 0 ≤ P(A) ≤ 1 Hard example: In 5-card draw, P(4-of-a-kind) Example: The first box model • A box contains tickets with the numbers 1, 1, 4, 4, 4, 7, 7, 7, 7, 12. Pick a random slip. • P(1) = 2/10 = 20% • P(7) = 4/10 = 40% • P(not 7) = 60% • P(1 or 12) = 3/10 = 30% • P(even) = 4/10 = 40% Boolean operations: And (I) • Conditional probability P(A|B) (“probability of A given B”): Sample space is restricted to B (i.e., we know B has occurred). Now compute probability of A. – Ex: Pick a card from a (straight) deck. (Face cards don’t include aces.) • P(♥) = 13/52 = 1/4 • P(♥ | face card) = 3/12 = 1/4 – Ex: Two cards dealt face down: • P(2nd is deuce) = 4/52 = 1/13 • P(2nd is deuce | 1st is deuce) = 3/51 = 1/17 Boolean operations: And (II) • Multplication rule: P(A and B) = P(A)•P(B|A) • “independent events”: P(B|A) = P(B) − So with indep events, mult rule becomes P(A and B) = P(A)•P(B) − Remark: If A is indep of B, then B is indep of A. Examples • Boxes of tickets – Box 1: A1,A2,A2,B1,B1,B2 : letters and numbers are not independent – Box 2: A1,A2,A2,B1,B2,B2 : letters and numbers are independent • From box of 1, 1, 4, 4, 4, 7, 7, 7, 7, 12, pick two tickets: – with replacement: – P(two 4’s) = (3/10)(3/10) and P(1 then 7) = (2/10)(4/10) – without replacement: – P(two 4’s) = (3/10)(2/9) and P(1 then 7) = (2/10)(4/9) • Die thrown 4 times – Which is more likely, 3333 or 1436? – What is P( 4 scores ≤ 2 )? Boolean operations: And (III) • Ex: Caucasian woman with blonde ponytail snatched purse, jumped into yellow car driven by black man with mustache and beard. Man and woman fitting description arrested. At trial, prosecutor says probs are: yellow car, 1/10; man with mustache, 1/4; woman with ponytail, 1/10; woman with blonde hair, 1/3; black man with beard, 1/10; interracial couple in car, 1/1000. So chances are 1/(10•4•10•3•10•1000) = 1/12,000,000 that they are wrong people. (???) Boolean operations: Or (I) • Addition rule: P(A or B (or both)) = P(A) + P(B) – P(A and B) • “mutually exclusive events”: P(A and B) = 0; i.e., if one occurs, the other cannot – With mut excl events, addition rule becomes P(A or B) = P(A) + P(B) Examples • Pick a card. – P(A or K) = 4/52 + 4/52 = 2/13 – P(A or ♠) = 4/52 + 13/52 – 1/52 = 4/13 • Tickets 1-100 in a box, draw one. – P(≤ 10 or ≥ 90) = 10/100 + 11/100 = 21/100 – P(≤ 10 or div by 5) = 10/100 + 20/100 – 2/100 = 7/25 Expected values • Ex 1: Flip a coin 10 times, paying $1 to play each time. You win $.50 (plus your $1) if you get a head. How much should you expect to win? • Ex 2: Roll two dodecahedral (12-sided) dice. You win $10 (plus your payment to play) if you get doubles. How much should you pay to play for a fair game? Two similar examples: • From text: Paradox of the Chevalier de la Méré: P(at least 1 ace in 4 rolls of die) > P(at least 1 double-ace in 24 rolls of 2 dice) • Birthday problem: With 30 people in a room, how likely is it that at least two have the same birth date? The Birthday problem # people P(no match) P(match) 18 0.65308858 0.34691142 19 0.62088147 0.37911853 20 0.58856162 0.41143838 21 0.55631166 0.44368834 22 0.52430469 0.47569531 23 0.49270277 0.50729723 24 0.46165574 0.53834426 25 0.4313003 0.5686997 26 0.40175918 0.59824082 0.11694818 27 0.37314072 0.62685928 0.85885862 0.14114138 28 0.34553853 0.65446147 12 0.83297521 0.16702479 29 0.31903146 0.68096854 13 0.80558972 0.19441028 30 0.29368376 0.70631624 14 0.77689749 0.22310251 31 0.26954537 0.73045463 15 0.74709868 0.25290132 32 0.24665247 0.75334753 16 0.71639599 0.28360401 33 0.22502815 0.77497185 17 0.68499233 0.31500767 34 0.20468314 0.79531686 2 0.99726027 0.00273973 3 0.99179583 0.00820417 4 0.98364409 0.01635591 5 0.97286443 0.02713557 6 0.95953752 0.04046248 7 0.9437643 0.0562357 8 0.92566471 0.07433529 9 0.90537617 0.09462383 10 0.88305182 11 Tree diagram Flip a coin, then roll a die, list all alternatives The Monty Hall Problem (From Marilyn vos Savant’s column) Game show: Three doors hide a car and 2 goats. Contestant picks a door. Host opens one of the other doors to reveal a goat. Contestant then may switch to the other unopened door. Is it better to stay with the original choice or to switch; or doesn’t it matter? Marilyn’s answer: Switch! Many respondents: Doesn’t matter. (“You’re the goat!”) Tree diagram of Stayer’s possible games Binomial coefficients (I) • How many ways are there to choose k things (without regard to order) from a set of n things? – How many ways are there to choose 3 club officers from a set of 5 to get funded for a trip to a convention? – How many ways are there to choose 2 cards out of the 4 of a given rank to form a pair? – How many ways are there to choose, out of 8 replications of an experiment, 6 to be successful? Ways to arrange 3 letters taken from {a,b,c,d,e} abc aec bde cda dbc ead abd aed bea cdb dbe eba abe bac bec cde dca ebc acb bad bed cea dcb ebd acd bae cab ceb dce eca ace bca cad ced dea ecb adb bcd cae dab deb ecd adc bce cba dac dec eda ade bda cbd dae eab edb Ways to arrange 3 given letters: a, b, c abc acb bac bca cab cba aeb bdc cbe dba eac edc Binomial coefficients (II) • Step one: How many ways are there to choose k things in order from a set of n things? – n(n-1)(n-2)...(n-k+1) • Step two: How many ways are there to order k given things? – k(k-1)(k-2)...1 • Step three: Divide. – C(n,k) = [n(n-1)(n-2)...(n-k+1)]/[k(k-1)(k-2)...1] • Notation: n! = n(n-1)(n-2)...1 [1 if n=0] – C(n,k) = n!/[k! (n-k)!] Binomial coefficients (I) revisited • How many ways are there to choose k things (without regard to order) from a set of n things? – How many ways are there to choose 3 club officers from a set of 5 to get funded for a trip to a convention? – How many ways are there to choose 2 cards out of the 4 of a given rank to form a pair? – How many ways are there to choose, out of 8 replications of an experiment, 6 to be successful? Binomial probabilities (I) • General question: Suppose an experiment is carried out n times under the same conditions. A given event (set of outcomes) A has probability p . What is the probability that A occurs exactly k times out of the n repetitions? – Ex: Roll a die 5 times. Probability of getting exactly three 4’s? • There are exactly C(5,3) = 10 patterns of three 4’s and 2 non-4’s • Each has probability (1/6)3(5/6)2 • So the answer is 10 (1/6)3(5/6)2 • In general, the answer is C(n,k) pk (1-p)n-k • Reqs for binomial probability: – (1) Experiment has 2 complementary outcomes. – (2) On repeated trials, probabilities don’t change. • Though formula gives probability of exactly k “successes” out of n repetitions of an experiment, we will usually use it for counting at least k “successes” out of n – So we have to add up the probabilities for k and k+1 and k+2 and ... and n . Examples of binomial distributions (I) • In a family of 5 kids, P(exactly 3 girls) • Roll a die 15 times, P(exactly 4 twos) • Roll two dice 10 times, P(at most two sums of 5) Examples of binomial distributions (II) • Feed vitamin A to one each of 10 pairs of rats, then all run a maze. In 7 pairs, the Arat was faster. If vitamin A was no help (i.e., each rat was equally likely to be faster), how likely is it that, just by chance, A-rat was faster in at least 7 pairs? • In a county that is 40% Caucasian, how likely is it that a jury pool of 20 people has 18 or more Caucasians? Math stuff about binomial coeffients • They’re called that because they are the coefficients of x and y in the expansion of (x+y)n: – C(n,0)xn + C(n,1)xn-1y + C(n,2)xn-2y2 + ... + C(n,n-1)xyn-1 + C(n,n)yn • For small n , compute C(n,k) with “Pascal’s triangle”: 1’s in first row and column, then each entry is sum of the one above and the one to the right (More from Marilyn vos Savant’s column) Suppose we assume that 5% of the people are drug users. A drug test is 95% accurate (i.e., it produces a correct result 95% of the time, whether the person is using drugs or not). A randomly chosen person tests positive. Is the person highly to be a drug user? Marilyn’s answer: Given your conditions, once the person has tested positive, you may as well flip a coin to determine whether she or he is a drug user. The chances are only 5050. But the assumptions, the make-up of the test group and the true accuracy of the tests themselves are additional considerations. (To see this, suppose the population is 10,000 people; compare numbers of false positives and true positives.) Drug [disease] testing probabilities Drug [disease] present? Test positive Test negative Sum Yes “Sensitivity” False negative 1 No False positive “Specificity” 1 Ex: Suppose the Bovine test for lactose abuse has a sensitivity of 0.99 and a specificity of 0.95; and that 7% of a certain population abuses lactose. If a person tests positive on the Bovine test, how likely is it that (s)he really abuses lactose? pos neg abuser .99 .01 clean .05 .95 Assuming 7% of population is really positive: x = sensitivity y = specificity z = P(pos test => pos) curve: x = .99 points: x = .99 y = .95 , .90 z = .6 , .42 Counting dragonflies (thanks to Profs. V. MacMillen and R. Arnold) Only two pairs • 30 censuses altogether, 17 with only two pairs • Of 17, 12 had both in same plot • Do they prefer to lay eggs in proximity? Censuses with >2 pairs P1 0 0 P2 0 3 P3 3 0 P1 4 0 P2 0 0 P3 0 4 P1 P2 P3 3 0 2 0 0 5 3 1 1 0 0 2 0 2 0 3 4 0 1 0 4 0 0 0 0 3 2 3 1 3 2 1 0 3 1 2 0 2 1 0 0 0 0 2 2 2 1 3 3 0 1 1 0 1 2 0 2 0 3 0 1 1 0 2 4 2 0 0 1 0 4 5 Up to 12 at the same time With 3 pairs