Probability Role of probability in statistics: Gather data by probabilistic (random) mechanism. Use probability to predict results of experiment under assumptions. Compute probability of error larger than given amount. Compute probability of given departure between prediction and results under assumption. Decide whether or not assumptions likely realistic. 101 Outline in randomized clinical trial context: Salk vaccine. Suppose vaccine useless – no cases of polio prevented, none caused. Between treatment and control total of 56+142=198 cases. Each child assigned to treatment / control with chance 1/2. Is a 56 – 142 split likely in 198 coin tosses? Actual chance of 56 or fewer heads in 198 tosses is: 4 × 10−10 (4 in 10 billion). Conclusion. Random assignment almost certainly not explanation for difference. 102 Chance of exactly 56 heads? 2.6 × 10−10 Chance of exactly 99 heads? 0.056 or so. No outcome particularly likely. Evidence for effective vaccine: low number of cases in treatment group. We judge evidence against hypothesis of no vaccine effect by computing: Probability of getting evidence against hypothesis as strong as, or stronger than, the evidence we did get assuming hypothesis true. Literary technique: foreshadowing – we come back to this logic in Chapter 14, 103 Where did all these numbers come from? Rules, jargon of probability: Do chance / random experiment: toss thumbtack on each screen. Possible outcomes: each tack can land point up or tipped over. The Sample Space Outcome number 1 2 3 4 Screen Left Right up up up over over up over over Short hand UU UO OU OO Sample space has 4 elements: {U U, U O, OU, U U }. 104 Probability: assign to each possible a number, the probability of that outcome. Rules for the numbers: all probabilities are greater than or equal to 0. all probabilities are less than or equal to 1. the probabilities to each possible outcome must add up to 1. We assign probabilities to other things like: Probability the two thumbtacks land opposite to each other? P (U O) + P (OU ) P is for probability. 105 BUT: how do we assign the four numbers? Make simple assumptions. Use rules of probability. Simpler experiment: toss coin on each screen. Possible outcomes: {HH, HT, T H, T T } Use interpretation of probability: P (HH) is long-run fraction of times HH occurs. In the long run which of four possibilities should occur most often? None. H should be about as common as T on each screen and when H shows up on left screen H and T should be equally common on right screen. Argument by symmetry: all four outcomes have same chance. 106 So: 1 P (HH) = P (HT ) = P (T H) = P (T T ) = 4 Does the same go for thumbtacks? NO: thumbtack not symmetric in the same way. But: suppose long run fraction of U on left were say 1/3 and long run fraction of U on right were say 1/4. (Different tacks!) Long run fraction of U U would be 1 1 × 3 4 “Of the trials where left tack turns out U , should see 1/4 have U on right screen — no contact/influence between screens.” Jargon: left, right screen results independent. 107 This would lead to Outcome Prob UU UO OU OO 1 12 3 12 2 12 6 12 Where do the 1/3 and 1/4 come from? Experimentation – empirical measurement. Hypothesis / theory – as in genetics. Symmetry (not in this example, though) Physics, maybe. Only empirical method likely to work in this experiment. 108 Return to Salk trial. If no vaccine effect then: number of cases in treatment like number of heads in 198 tosses of fair coin. (Treat number of cases as predestined.) How do we compute chance of 56 heads in 198 tosses. Method 1: exact calculation based on rules of probability. Leads to Binomial distribution. Gives answer 1 198 × 197 × · · · × 1 × 198 (56 × · · · × 1) × (142 × · · · × 1) 2 109 Commentary: Uselessly difficult to compute, even with calculator. Care need to compute accurately on computer. Can do just as well with a normal approximation. Method two: use central limit theorem. Next: some details of Method 1. Then: details of Method 2. 110 Some examples to illustrate ideas underlying binomial distribution. Toss coin twice. The Sample Space Toss 1 2 T T T H H T H H # Heads 0 1 1 2 111 Convert to chances for 0, 1, 2 heads: Distribution of Number of Heads # Heads Probability 0 1 2 1 4 2 4 1 4 Notice: 22 = 4 possible outcomes. All chances of form: # of ways divided by 4. 112 Next three tosses: The Sample Space Toss 1 2 3 T T T T T H T H T H T T T H H H T H H H T H H H # Heads 0 1 1 1 2 2 2 3 Prob 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 113 Distribution of Number of Heads # Heads Probability 0 1 2 3 1 8 3 8 3 8 1 8 Notice: 23 = 8 possible outcomes. All chances of form: # ways / 8. 114 In general: toss coin n times. 2n possible outcomes. each chance is 2n. Count up number of ways to get desired number of heads. Chance of that many heads is # ways / 2n. 115 What if it had been a thumbtack? More difficult to compute chances. Use rules: Jargon “get 56 heads” is an event — collection of all outcomes where there are 56 heads and 142 tails. Simpler case: Toss coin three times: Some events: Get 2 heads Get 0 heads Get at least 2 heads {HHT, HT H, T HH} {T T T } {HHT, HT H, T HH, HHH} 116 Rules of probability: in following A, B shorthand for two events. S is collection of all possible outcomes, the sample space. 1) 0 ≤ P (A) ≤ 1 2) P (S) = 1. 3) P (A doesn’t happen) = 1 − P (A). 4) if it is impossible for both A and B to happen at the same time and C is the event “either A happens or B happens or both happen” then P (C) = P (A) + P (B) 5) if A and B are independent and D is the event “both A and B happen” then P (D) = P (A) × P (B) 4) is the addition rule 5) is the multiplication rule. 117 Independent means: whether or not A happens doesn’t influence chance B happens. Example: back to two thumbtacks. A is “left tack lands U ” B is “right tack lands U ” Suppose P (A) = 1/3 and P (B) = 1/4. If I tell you A happened then you should still think probability that B will happen is 1/4 – no way for result of left toss to influence result of right toss. D is event both A and B happen, that is, get UU. Then P (D) = P (A) × P (B) = 1 1 1 × = 3 4 12 118 Recognizing use of Binomial distribution: 1) repeat same basic experiment a fixed number, n say, of times. 2) each trial results either in something happening (called “SUCCESS” or ’S’) or not happening (“FAILURE” or ‘F’); 3) trials independent. Outcome of one does not influence any other. 4) prob of success on each trial is same, say p. If so then P (exactly k successes) n! pk (1 − p)n−k = k!(n − k)! Notation in other books: n n! = = nCk = Ckn. k!(n − k)! k 119 In this course: we don’t memorize the formula. How is formula derived: use rules of probability above and combinatorics or counting. Graphical presentation of binomial probabilities: draw histogram. Area of bar = chance of corresponding value. 0.0 0.2 0.4 Example: toss coin twice; n = 2, p = 1/2. −0.5 0.0 0.5 1.0 1.5 2.0 2.5 120 Total area of all bars is 1. 0.0 0.2 n = 3, p = 0.5 0 1 2 3 121 0.00 0.10 0.20 n = 16, p = 0.5 2 4 6 8 10 12 14 0.00 0.06 n = 64, p = 0.5 20 25 30 35 40 45 122 0.00 0.03 n = 256, p = 0.5 110 120 130 140 150 123 Now try p = 1/4. 0.0 0.2 0.4 n = 2, p = 1/4. −0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 n = 3, p = 1/4 0 1 2 3 124 0.00 0.10 0.20 n = 16, p = 1/4 0 2 4 6 8 0.00 0.06 n = 64, p = 1/4 5 10 15 20 25 125 0.00 0.03 n = 256, p = 1/4 50 60 70 80 126 Notice increasing symmetry. Notice general shape of normal curve. Now superimpose normal curves! Idea: compute probabilities by adding up areas of bars or make normal approximation. To do so: need mean and standard deviation of histogram! Mean is µ = np SD is σ = q np(1 − p). 127 Superimpose normal curves. p = 0.5 0.0 0.2 0.4 n=2 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 n=3 0 1 2 3 128 0.00 0.10 0.20 n = 16 2 4 6 8 10 12 14 0.00 0.06 n = 64 20 25 30 35 40 45 0.00 0.03 n = 256 110 120 130 140 150 129 p = 0.25 0.0 0.3 0.6 n=2 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 n=3 0 1 2 3 130 0.00 0.10 0.20 n = 16 0 2 4 6 8 0.00 0.06 n = 64 20 25 30 35 40 45 0.00 0.03 n = 256 110 120 130 140 150 131 Making normal approximations to compute binomial probabilities: Identify range: Compute mean and standard deviation. Convert range to standard deviation units. Look up area in normal table. 132 Example: Probability of 1 or 2 heads in 3 tosses of fair coin. 1) Range is 0.5 to 2.5. Reason for halfs: edges of bars at -0.5,0.5,1.5, and so on. 2) Mean is µ = np = 3 ∗ 0.5 = 1.5 SD is σ = q 3 ∗ 0.5 ∗ (1 − 0.5) = 0.866. Convert to standard deviation units: 0.5 − 1.5 2.5 − 1.5 to 0.866 0.866 or -1.15 to 1.15. Look up area: 0.7499≈0.75. Correct answer is 3/4=0.75. 133 Salk vaccine examples. If the vaccine is ineffective then number of polio cases in treatment group is like number of heads in 198 tosses of a fair coin. That is: Binomial distribution for number of cases in treatment. Reasoning: 198 cases destined regardless of outcome of randomization. Each case assigned to treatment with same chance, 1/2. The cases are assigned independently. Chance of 56 heads or fewer in 198 tosses: Limits: 56.5 or fewer. Mean is µ = 198 ∗ 0.5 = 99. √ SD is σ = 198 ∗ 0.5 ∗ 0.5 = 7.04. 134 Convert 56.5 to standard deviation units: 56.5 − 99 = −6.04 7.03 Off end of the tables! Chance ≤ 0.0003 from tables. But actual chance from software is: 7.7×10−10. Interpretation: either the hypothesis above (where I wrote ‘if’) is wrong or something extremely unlikely has happened; conclude hypothesis of no treatment effect is wrong. 135 Why not compute chance of exactly 56 heads instead of 56 or fewer? Another example: Imagine toss coin 10,000 times. exactly 5,000 heads? Chance of Range is 4999.5 to 5000.5. Mean is µ = 5000. √ SD is σ = 10000 ∗ 0.5 ∗ 0.5 = 50. Convert range to standard units: 5000.5 − 5000 4999.5 − 5000 to 50 50 136 This is -0.01 to 0.01 so chance is approximately 0.0080. Notice: even most likely outcome is not very likely! In hypothesis testing we compute chance of results “as extreme as or more extreme than’ the results we actually got assuming some hypothesis is true. If the chance comes out small we conclude the hypothesis is (likely) not true. 137