Basic Probability With an Emphasis on Contingency Tables Students in PSYC 2101 • Skip to Slide # 7. Random Variable • A random variable is real valued function defined on a sample space. – The sample space is the set of all distinct outcomes possible for an experiment. – Function: two sets’ (well defined collections of objects) members are paired so that each member of the one set (domain) is paired with one and only one member of the other set (range) • The domain is the sample space, the range is a set of real numbers. • A random variable is the set of pairs created by pairing each possible experimental outcome with one and only one real number. Examples the outcome of rolling a die: = 1, = 2, = 3, etc. (Each outcome has only one number, and, vice versa) = 1, = 2, = 1, etc. (each outcome has (odd-even) only one number, but not vica versa) The weight of each student in my statistics class. Probability Distribution • Each value of the random variable is paired with one and only one probability. • More on this later. Probability Experiments • A probability experiment is a welldefined act or process that leads to a single well defined outcome. – Flip a coin, heads or tails. – Roll a die, how many spots up. – Stand on a digital scale, what number is displayed. Probability • The probability of an event, P(A) is the fraction of times that event will occur in an indefinitely long series of trials of the experiment. • Cannot be known, can be estimated. Estimating Probability • Empirically – perform experiment many times, compute relative frequencies. • Rationally – make assumptions and then apply logic. • Subjectively – strength of individual’s belief regarding whether an event will or will not happen – often expressed in terms of odds. Odds of Occurrence of Event A • If the experiment were performed (a & b) times, we would expect A to occur a times and B to occur b times. • There are 20 students in a class, 14 of whom are women. If randomly select one, what are the odds it will be a woman? • 14 to 6 = 7 to 3. Convert Odds to Probability • • • • Probability = a/(a & b). 14 women, 6 men. Odds = 7 to 3. Probability = 7 out of 10. Convert Probability to Odds • Odds = P(A)/P(not A) • Probability = .70 • Odds = .70/(1 - .70) = 7 to 3 Independence • Two events are independent iff (if and only if) the occurrence or non-occurrence of the one has no effect on the occurrence or non-occurrence of the other. – I roll a die twice. The outcome on the first roll has no influence on the outcome on the second roll. Mutual Exclusion • Two events are mutually exclusive iff the occurrence of the one precludes occurrence of the other (both cannot occur simultaneously on any one trial). – You could earn final grade of A in this class. – You could earn a B. – You can’t earn both. Mutual Exhaustion • Two (or more) events are mutually exhaustive iff they include all possible outcomes. – You could earn a final grade of A, B, C, D, or F. – These are mutually exhaustive since there are no other possibilities. Marginal Probability • The marginal probability of event A, P(A), is the probability of A ignoring whether or not any other event has also occurred. – P(randomly selected student is female) = .70 Conditional Probability of A • the probability that A will occur given that B has occurred • P(A|B), the probability of A given B. – Given that the selected student is wearing a skirt, the probability that the student is female is .9999 – Unless you are in Scotland • If P(A|B) = P(A), the A and B are independent of each other. Joint Probability • The probability that both A and B will occur. • P(A B) = P(A) P(B|A) = P(B) P(A|B) • If A and B are independent, this simplifies to P(A B) = P(A) P(B) • This is known as the Multiplication Rule The Addition Rule • If A and B are mutually exclusive, the probability that one or the other will occur is the sum of their separate probabilities. Grade A B C D F Probability .2 .3 .3 .15 .05 P(A B) P(A) P(B) .2 .3 .5 • If A and B are not mutually exclusive, things get a little more complicated. • P(A B) = P(A) + P(B) - P(A B) Two-Way Contingency Table • A matrix where rows represent values of one categorical variable and columns represent values of a second categorical variable. • Can be use to illustrate the relationship between two categorical variables. Survey Questions • We have asked each of 150 female college students two questions: 1. Do you smoke (yes/no)? 2. Do you have sleep disturbances (yes/no)? • Suppose that we obtain the following data (these are totally contrived, not real): Marginal Probabilities P(Smoke) 100 150 10 15 2 .6 6 P(Sleep) 3 90 150 9 15 Sleep? Smoke? No Yes No 20 30 50 Yes 40 60 100 60 90 150 3 5 . 60 Conditional Probabilities Show Absolute Independence P(Sleep | Smoke) 60 100 3 . 60 P(Sleep | Nosmoke) 5 30 50 Sleep? Smoke? No Yes No 20 30 50 Yes 40 60 100 60 90 150 3 5 . 60 Multiplication Rule Given Independence • Sixty of 150 have sleep disturbance and smoke, so P (Sleep Smoke) = 60/150 = .40 • P(A B) = P(A) x P(B) P(Sleep Smoke) P(Sleep) x P(Smoke) 3 5 2 3 6 15 . 40 “Sleep” = Sexually Active • Preacher claims those who smoke will go to Hell. • And those who fornicate will go to Hell. • What is the probability that a randomly selected coed from this sample will go to Hell? Addition Rule P(Sleep) 90 150 P(Sleep) 9 . 60 P(Smoke) 15 P(Smoke) 100 150 9 15 10 15 19 10 .6 6 15 1 . 27 15 A probability cannot exceed one. Something is wrong here! Welcome to Hell • The events (sleeping and smoking) are not mutually exclusive. • We have counted the overlap between sleeping and smoking (the 60 women who do both) twice. • 30 + 40 + 60 = 130 of the women sleep and/or smoke. • The probability we seek = 130/150 = 13/15 = .87 Addition Rule For Events That Are NOT Mutually Exclusive P(Sleep Smoke) P(Sleep) P(Smoke) 9 6 15 10 15 - 15 13 15 - P(Sleep Smoke) .87. Sleep = Sexually Active, Smoke = Use Cannabis Sleep? Smoke? No Yes No 30 20 50 Yes 40 60 100 70 80 150 Marginal Probabilities P(Smoke) 100 150 2 .6 6 P(Sleep) 3 80 150 Sleep? Smoke? No Yes No 30 20 50 Yes 40 60 100 70 80 150 8 15 .5 3 Conditional Probabilities Indicate Nonindependence P(Sleep | Smoke) 60 . 60 P(Sleep | Nosmoke) 100 20 50 Sleep? Smoke? No Yes No 30 20 50 Yes 40 60 100 70 80 150 . 40 Joint Probability • What is the probability that a randomly selected coed is both sexually active and a cannabis user? • There are 60 such coeds, so the probability is 60/150 = .40. • Now let us see if the multiplication rule works with these data. Multiplication Rule P(Sleep Smoke) P(Sleep) x P(Smoke) 8 15 2 3 16 .3 5 45 • Oops, this is wrong. The joint probability is .40. We need to use the more general form of the multiplication rule. Multiplication Rule NOT Assuming Independence P(Smoke P(Smoke) 2 3 3 5 Sleep) P(Sleep | Smoke) 6 . 40 . 15 • Now that looks much better. Actual Data From Jury Research • Castellow, Wuensch, and Moore (1990, Journal of Social Behavior and Personality, 5, 547-562 • Male employer sued for sexual harassment by female employee. • Experimentally manipulated physical attractiveness of both litigants Effect of Plaintiff Attractiveness • P(Guilty | Attractive) = 56/73 = 77%. • P(Guilty | Not Attractive) = 39/72 = 54%. • Defendant found guilty more often if plaintiff was attractive. Guilty? Plaintiff Attractive? No Yes No 33 39 72 Yes 17 56 73 50 95 145 Odds and Odds Ratios • • • • Odds(Guilty | Attractive) = 56/17 Odds(Guilty | Not Attractive) = 39/33 Odds Ratio = 56/17 39/33 = 2.79. Odds of guilty verdict 2.79 times higher when plaintiff is attractive. Guilty? Plaintiff Attractive? No Yes No 33 39 72 Yes 17 56 73 50 95 145 Effect of Defendant Attractiveness • P(Guilty | Not Attractive) = 53/70 = 76%. • P(Guilty | Attractive) = 42/75 = 56%. • The defendant was more likely to be found guilty when he was unattractive. Guilty? Attractive? No Yes No 17 53 70 Yes 33 42 75 50 95 145 Odds and Odds Ratio • • • • Odds(Guilty | Not Attractive) = 53/17. Odds(Guilty | Attractive) = 42/33. Odds Ratio = 53/17 42/33 = 2.50. Odds of guilty verdict 2.5 times higher when defendant is unattractive. Guilty? Attractive? No Yes No 17 53 70 Yes 33 42 75 50 95 145 Combined Effects of Plaintiff and Defendant Attractiveness • Plaintiff attractive, Defendant not = 83% guilty. • Defendant attractive, Plaintiff not = 41% guilty. • Odds ratio = 83/17 41/59 = 7.03. • When attorney tells you to wear Sunday best to trial, listen. Odds Ratios and Probability Ratios • Odds of Success – 90/10 = 9 for Antibiotic Group – 40/60 = 2/3 for Homeopathy Group – Odds Ratio = 9/(2/3) = 13.5 Odds Ratios and Probability Ratios • Odds of Failure – 10/90 = 1/9 for Antibiotic Group – 60/40 = 1.5 for Homeopathy Group – Odds Ratio = 1.5/(1/9) = 13.5 Notice that the odds ratio comes out the same with both perspectives. Odds Ratios and Probability Ratios • Probability of Success – 90/100 = .9 for Antibiotic Group – 40/100 = .4 for Homeopathy Group – Probability Ratio = .9/(.4) = 2.25 Odds Ratios and Probability Ratios • Probability of Failure – 10/100 = .1 for Antibiotic Group – 60/100 = .6 for Homeopathy Group – Odds Ratio = .6/(.1) = 6 Notice that the probability ratio differs across perspectives. Another Example • According to Medscape, 0.5% of the general population has narcissistic personality disorder (NPD) • The rate is 20% among members of the US Military. Odds Ratios • Odds of NPD – Military: .2/.8 = .25 – General: .005/.995 = .005 – Ratio: .25/.005 = 49.75 • Odds of NOT NPD – Military: .8/.2 = 4 – General: .995/.005 = 199 – Ratio: 199/4 = 49.75 Probability Ratios • Probability of NPD – Military: 20% – General: 0.5% – Ratio: 20/0.5 = 40. • Probability of NOT NPD – Military: 80% – General: 99.5% – Ratio: .995/.8 = 1.24 Probability Distributions • For a discrete variable, pair each value with the probability of obtaining that value. • For example, I flip a fair coin five times. What is the probability for each of the six possible outcomes? • May be a table, a chart, or a formula. Probability Table Number of Heads 0 1 2 3 4 5 Percent 3.1 15.6 31.2 31.2 15.6 3.1 Probability Chart Probability Formula • y is number of heads, n is number of tosses, p is probability of heads, q is probability of tails P Y y n! y ! (n - y) ! y p q n y Continuous Variable • There is an infinite number of values, so a table relating each value to a probability would be infinitely large. • The probability of any exact value is vanishingly small. • We can find the probability that a randomly selected case has a value between a and b. Evolution of a Continuous Variable • I’ll start with a histogram for a discrete variable. • In each step I’ll double the number of values (and number of bars). • All the way up to an infinite number of values with each bar infinitely narrow. • Now one final step, to an uncountably large number of bars, each infinitely narrow, yielding a continuous, uniform distribution ranging from A to B. • Now I do the same but I start with a binomial distribution with p = .5 and three bars. • Note that the bars are not all of equal height. • Each time I split one, I lower the height of the tail-wards one more than the centerwards one. • Now one final leap to a continuous (normal) distribution with an uncountably large number of infinitely narrow bars. Random Sampling • Sampling N data points from a population is random if every possible different sample of size N was equally likely to be selected. • Random samples most often will be representative of the population. • Our stats assume random sampling. Y Random, X Not Probability Sample AB AC AD BC BD CD X 1/2 0 0 0 0 1/2 Y 1/6 1/6 1/6 1/6 1/6 1/6 Counting Rules • PSYC 2101 students can skip the material in the rest of this slide show. Arranging Y Things • There are Y! ways to arrange Y different things. • I am getting a four scoop ice cream cone. • Chocolate, Vanilla, Coconut, and Mint. • How many different ways can I arrange these four flavors? • 4! = 4(3)(2)(1) = 24. Permutations • If I have 10 different flavors, how many different ways can I select and arrange 4 different flavors from these 10? N! ( N Y )! 10 ! (10 4 )! 10 9 8 7 6! 6! 5040 Combinations • Same problem, but order of the flavors does not count. • The are Y! ways to arrange Y things, so just divide the number of permutations by Y! N! ( N Y )! Y ! 10 ! 6! 4! 10 9 8 7 6! 6! 4 3 2 210 Number of Different Strings • CL = number of different strings • C is the number of different characters available • L is the length of the string. • Ten different characters (0 – 9) and two character strings • 102 = 100 different strings • • • • • • Use letters instead (A through Z) 262 = 676 different strings Use letters and numbers 362 = 1,296 different strings Use strings of length 1 or 2. 36 + 1,296 = 1,332 different strings • • • • • • • • Use strings of length up to 3. 363 = 46,656 three character strings + 1,332 one and two character strings 47,988 different strings. Use lengths up to 4 1,679,616 + 47,988 = 1,727,604 Use lengths up to 5 60,466,176 + 1,727,604 = 62,193,780 • Use strings of length up to 6 • 2,176,782,336 + 62,193,780 = 2,238,976,116 different strings • That is over 2 BILLION different strings.