STAT 203 – Lecture 4-1. - The normal distribution is symmetric. Getting the probability from between two z-scores Translating standard scores to and from raw scores. Extreme values beyond the table. So Majestic! Text from last Friday: Say a value X followed the normal distribution, with mean (mu, pronounced ‘mew’) and standard deviation μ σ (sigma). We used the z-table to find things like the probability that X is greater than 1.28 standard deviations above the mean. In other words, we found Pr( X > μ + 1.28σ) μ + 1.28σ means a z-score of 1.28. From the z-table, page 515 z Area between Mean and z … … 1.27 39.80 Area beyond z … 10.20 1.28 39.97 10.03 1.29 … 40.15 … 9.85 … Since we’re looking at the values farther away from the mean than the cutoff, we want the area beyond z. Pr( X > μ + 1.28σ) = 10.03%, or about 10% Can we find Pr( X > μ - 1.28σ) ? Hint: Think symmetry. We can find Pr( X > μ - 1.28σ) Symmetry: The same on both sides. We can find Pr( X > μ - 1.28σ) Symmetry: The same on both sides. What is the chance that this value, X, is more than 1 standard deviation away the mean in either direction? Start with Pr( X > μ + 1σ) , or, because it’s simpler to write: Pr( Z > 1) By the table (page 514)… z Area between Mean and z … … .99 33.89 34.13 1.00 1.01 34.38 Area beyond z … 16.11 15.87 15.62 Pr( Z > 1) = .16 Pr( Z > 1) = .16, so Pr( Z < -1) = .16 also Pr( Z > 1) + Pr(Z < -1) = .32 Not surprizing since Pr( -1 < Z < 1) = .68, .68 + .32 = 1.00 We could have done this the other way too: Working backwards from Pr( -1 < Z < 1) = .68 We could get by converse Pr(Z < -1) + Pr(Z > 1) = .32 … and get by symmetry Pr(Z > 1) = .16 One other thing to note is that Z = 0 right at the mean, because the mean is 0 standard deviations above or below the mean. Let’s try with some uglier z-scores. Pr( -1.75 < Z < 0.52) z Area between Mean and z … … 0.51 19.50 Area beyond z … 30.50 0.52 19.85 30.15 … … 1.74 … … 45.91 … … 4.09 1.75 45.99 4.01 Doing the math… Pr( -1.75 < Z < 0.52) can be split into two ranges using the mean as the split point. Pr( -1.75 < Z < 0 ) + Pr( 0 < Z < 0.52) Why would we do this? Because the table has everything from the mean. Pr(-1.75 < Z < 0) = .4599 Pr(0 < Z < 0.52) = .1985 .4599 + .1985 = .6584 About 66% of the area. Pic of the 66% Z-scores, or standard scores, are a bridge between real data and probabilities surrounding them. We find z-scores with this (important!): X is the value that we’re interested in. We usually want to know the probability of getting a value below or above X. X is also called the raw score, meaning we haven’t prepared it for use at all. Raw as in ‘uncooked’. μ is the mean, in most cases this will be given to you. Look for clues like average, and centered around. μ is the mean, in most cases this will be given to you. Look for clues like average, and centered around. σ is the standard deviation, in most cases it’s given or computed from SPSS. The Z-Score is the number of standard deviations above the mean. Z-Score is also called Standard Score. Example problem: The time spent on homework in hours/week for full time students is normally distributed with mean 25, and standard deviation = 7 What proportion of students spend more than 20 hours on homework? Step 1: Identify – μ = 25, σ = 7, x = 20. We want the proportion, which is like the probability. We know the distribution is normal. These are clues to find the z-score / standard score, and use it in the z-table to get the proportion. Step 2: Apply. What do we want?! Z !!!! What do we have?! μ = 25, σ = 7, x = 20. !!!! Use the formula that has Z on one side, and μ, σ, and x on the other. -0.71 isn’t on the table, but by symmetry, we can use 0.71. By the table, 26.11% is between the mean and z=0.71 ,23.89% is beyond z=0.71. We want Pr( X > 20), which is Pr(Z > -0.71)… Method 1: Split Pr( Z > -0.71) = Pr( Z >0) + Pr(-0.71 < Z < 0) = .5000 + .2611 = .7611 Method 2: Converse Pr( Z > -0.71) = 1 – Pr(Z < -0.71) = 1 - .2389 = .7611 We can work backwards from a probability to get a value too, with this: (also important) This is the same formula as the z-score (standard score) formula, but rearranged so that X is the value we get out of it. Example problem: Homework/week is normally distributed, μ = 25, σ = 7 What’s the minimum homework I can expect 90% of the class to do? In other words Pr(X > ??? ) = .9000 Step 1: Identify. We have the proportion, and we want the value x. Again, z-score is going to be our bridge. Going X Z Prob, we used the table last. Going Prob Z X, we’ll use the table first. We want the Z value such that 10% of the area is beyond the mean. As z increases, the area beyond that value decreases. Z 0.00 0.01 0.02 0.03 0.04 0.05 … % Area Beyond 50.00 49.60 49.20 48.80 48.40 48.01 … We can use that to find the Z-score with 10% beyond. (Approximation may be needed) Z 0.00 0.01 0.02 … 0.44 0.45 0.46 … 1.27 % Area Beyond 50.00 49.60 49.20 … 33.00 32.64 32.36 … 10.20 1.28 10.03 Now we know Pr( Z > 1.28) = 10.03%, that’s the closest z-score to 10% in the table. What do we want?! X !!!! What do we have?! μ = 25, σ = 7, z = -1.28 !!!! So 90% of the full-time students spend 16.04 hours or more on homework. What proportion of students spend more than 60 hours/week? μ = 25, σ = 7, x = 60. Now we have z = 5, how do we get Pr(Z > 5)? The table only goes to z = 3.5ish. Use inference: We want the area beyond z=5, and the area shrinks as z goes up. The smallest area is 0.01%, so the area beyond z=5 must be smaller than that. That’s all we can tell from this table. Fewer than 0.01% of students spend 60 hours/week on homework. (for interest) Very few data points are going to be more than six standard deviations above or below the mean. Far less than 0.01% Six Sigma is a business practice based on making each part in a machine consistent enough that it will work as long as it’s within six standard deviations, or 6σ of the mean. Next time: - A few more notes on Z-scores - Discuss Midterm - We start chapter 6.