Today’s Agenda: - More examples with probability The normal distribution Probability and the normal distribution Z scores You should be about halfway through Chapter 5 by now. (For assignment as well) The probability rules. Probability = ways event occurs / ways all events occur Probability is always between zero and one. Zero probability means impossible. (Never happens) One probability means certain. (Always happens) Converse Rule: Pr(Not A) = 1 – Pr(A) Addition Rule: Pr(A or B) = Pr(A) + Pr(B) When A and B never happen together. Multiplication Rule: Pr(A and B) = Pr(A) x Pr(B) When A and B are independent. Say that someone charged with a gun offence is… Convicted of Gun offence (G) with probability .52 , convicted of a Lesser offence (L) with probability .26 and found Not guilty (N) with probability .22. So Pr(G) = .52, Pr(L) = .26, Pr(N) = .22 Also, only one decision can be made, so these events are mutually exclusive. We’re assuming that Guilty, Lesser charge, and Not guilty are the only options. Getting one of these is certain. So Pr(G, L, or N) = 1 As a check: Pr(G, L, or N) = Pr(G) + Pr(L) + Pr(N) = .52 + .26 + .22 = 1.00 We want the probability of being convicted on a gun or lesser charge; Pr(G or L) Method one: Addition rule Pr(G or L) = P(G) + Pr(L) = .52 + .26 = .78 We want the probability of being convicted on a gun or lesser charge; Pr(G or L) Method two: Converse rule Convicted of any offence is the opposite of being not guilty. So we could write (G or L) as Pr( Not N) Pr(Not N) = 1 – Pr(N) = 1 - .22 = .78 We expect to see the same answer both ways. There’s more than one way to get the right solution. If two people are charged in separate trials, we may be interested in knowing the probability that neither are convicted. We want: Pr(N1 and N2) N1 is Person 1 is not guilty, N2 means Person 2 is not guilty. Trials are independent; one conviction doesn’t impact the other. By multiplication rule: Pr(N1 and N2) = Pr(N1) x Pr(N2) = .22 x .22 = .0484 Another way to write Pr(N1 and N2) would be Pr( 2 Not Guilty verdicts out of 2) These two are equivalent. What about Pr(1 Not Guilty out of 2), or Pr(0 Not Guilty out of 2) ? For the sake of simpler math, let’s use coin flips instead, With Heads = Not Guilty And Tails = Everything Else Using the two verdicts as an analogy, we know Pr(2 Heads of 2) = Pr(H) x Pr(H) = ½ x ½ = ¼ And likewise… Pr(0 Heads of 2) = Pr(T) x Pr(T) = ½ x ½ = ¼ But Pr(1 Head of 2) is not Pr(T) x Pr(H)… Pr(T) x Pr(H) = ¼, but it’s the only other possibly and we only have ¾ probability in total. Pr(1 Head of 2) actually covers two situations.. Head, then Tails, with probability 1/4 Tails, then Head, with probability 1/4 … for a total chance of 1/2. Pr(0 heads) = 0.25 Pr(1 head ) = 0.5 Pr(2 heads) = 0.25 Remember this curve? This is the normal curve. μ, pronounced ‘mu’ is the mean for normals σ, pronounced ‘sigma’ is the standard deviation for normal μ + 2σ refers to the point two standard devs above the mean If data follows the normal curve, about 2/3 of the data is within 1 standard deviation, 95% of the data is within 2 standard deviations. 2/3 and 95% are proportions, or ratios between a part of a group and that group as a whole. Proportions are useful because they also imply probability. If 2/3 of the data is within 1sd, then if I pick a point at random from that distribution… … there is a 2/3 chance that it will be within 1 standard deviation. Example: Reading scores Grade 5s reading scores are normally distributed with mean 120 and standard deviation 25. Pick a grade 5 student at random… You have a 95% chance of getting one with a reading score between 70 and 170. Example: Reading scores 2 The normal distribution is symmetric, it’s the same on both sides of the mean/median. So the chance of picking a grade 5 with reading score 120 or more: 0.5 We can combine these rules and get other ranges. There is a 2/3 chance (68% if you care about the details) a data point will be between -1σ and +1σ. The area above the mean is the same as the area below the mean… What if we wanted the probability of in both of these ranges? By symmetry, half of that 68% is above the mean, so.. …we find half of 68%, or 34%. Pr( value is between mean and 1sd above mean) = 0.34 But what if we wanted… Pr(Value is less than μ + 1.28σ) …or something equally awful looking. Bad news: The formula for find most things from the normal table is so hard it can’t be written on paper. (No seriously, it’s called ‘no closed form’) Good news: We have a table that does most of the work for us. 11th edition: Appendix C, Table A, Pages 513 – 516. Early editions: Look for “Percentage of Area under the Normal Curve” table. No book: Search online for “Standard Normal Table”, look in images. (This one will look different from the textbook one, likely) The first page looks like: z .00 .01 .02 .03 .04 .05 .06 Area between Area beyond z Mean and z .00 50.00 .40 49.60 .80 49.20 1.20 48.80 1.60 48.40 1.99 48.01 2.39 47.61 Z, the z score, or the standard score, is the number of standard deviations above or below the mean. Recall: Pr(value between mean and mean+1sd) = 0.34 = 34% By the table… z Area between Mean and z … … .99 33.89 Area beyond z 1.00 34.13 … 16.11 15.87 1.01 34.38 15.62 … it’s actually 34.13%. z Area between Mean and z … … .99 33.89 1.00 34.13 Area beyond z … 16.11 15.87 We can also use the standard normal table to find the area past a certain point instead of between the mean and a point. z Area between Mean and z 0.00 .00 .01 .02 0.40 0.80 Pr( value 0sd above mean or more) = .50 = 50% Area beyond z 50.00 49.60 49.20 z Area between Mean and z 0.00 .00 .01 .02 A z-score of zero is right at the mean. 0.40 0.80 Area beyond z 50.00 49.60 49.20 Finally, to find Pr(Value is less than μ + 1.28σ) we find the area between μ and μ + 1.28σ… z Area between Mean and z … … 1.27 39.80 Area beyond z 1.28 39.97 … 10.20 10.03 … … … …and add 50% for the lower half that wasn’t counted in the table. We could have also used the probability that the value was NOT less than μ + 1.28σ and the converse rule. z Area between Mean and z … … 39.97 1.28 1 – 0.10 = 0.90 or 100% - 10% = 90% Area beyond z … 10.03 Next Monday, more on the z-table, and the relationship between z-scores and raw scores. Keep reading Chapter 5. For the bookless, stick around I’ll show you your table.