The (“Sampling”) Distribution for the Sample Mean* 1 Distribution of Sample Means A quantitative population of N units with parameters mean standard deviation A random sample of n units from the population Statistic: The sample mean X. 2 Distribution of Sample Means Statistic: The sample mean X. This statistic is an unbiased point estimate (on average correct) of the parameter . X 3 20 Times Rule / 5% Rule (same thing) If the population size (N) is at least 20 times the sample size (n) N / n 20 n / N 0.05 or then the standard deviation is (essentially) X n 4 Distribution of the Sample Mean Given A variable with population that is not Normally distributed with mean and standard deviation . A random sample of size n. When the population size is at least 20 times n. Result The sample mean has approximate Normal distribution with X X n 5 Example Rolls of paper leave a factory with weights that are Normal with mean = 1493 lbs, and standard deviation = 12 lbs. 6 Finding probabilities What is the probability a roll weighs over 1500 lbs? ANS: 0.2798 (about 28% of rolls exceed 1500 lbs) 1500 1493 Z 0.5833 12 7 New Question A truck transports 8 rolls at a time. The legal weight limit for the truck is 12,000 lbs. What is the probability 8 rolls have total weight exceeding this limit? Since 12000/8 = 1500, the question could also be phrased: What is the probability 8 rolls have (sample) mean weight exceeding 1500? The bad news: The answer is not 0.2798. The good news: It’s not that tough. 8 Distribution of the Sample Mean Review of previous slide Given A variable with population that is Normally distributed with mean and standard deviation . A random sample of size n. (N/n 20) Result The sample mean X has Normal distribution: X X n 9 Example - continued Rolls (single rolls) of paper leave a factory with weights that are Normal with mean = 1493 lbs, and standard deviation = 12 lbs. If n = 8 rolls are randomly selected, what is the probability their sample mean weight exceeds 1500? The distribution of sample means X is Normal. 12 X 4.243 X 1493 n 8 10 Finding probabilities Find the probability the sample mean is over 1500 lbs. Here we’re using the same mean, but a standard deviation reduced to 4.243. ANS: 0.0495 1500 1493 7 Z 1.650 4.243 4.243 11 Interpreting the Result The probability the sample mean for 8 rolls exceeds 1500 lbs is 0.0495. For 4.95% of all possible samples of 8 rolls, the sample mean exceeds 1500 lbs. Equivalent: There is a 0.0495 probability that the total weight will exceed 81500 = 12,000 lbs. We’re working towards using the sample mean as an estimate of the population mean. 12 The Picture 1500 Sample mean weights for samples of 8 rolls. 1453 1463 1473 1483 Weights of single rolls. 1493 1503 Weight (lbs) 1513 1523 1533 13 The Picture About 28% of all rolls are > 1500 lbs 14 The Picture About 5% of all samples of 8 rolls have mean > 1500 lbs 15 Example Survival times have a right skewed distribution with mean = 13 months and standard deviation = 12 months. What can we say about the distribution of sample mean survival times for samples of n patients? X 13 12.0 X n n As n gets larger, the distribution gets closer to Normal. 16 Sample mean n = 64 SD = 1.5 Single values SD = 12.0 Sample mean n = 16 SD = 3.0 Sample mean n = 4 SD = 6.0 0 10 20 30 40 50 60 13 17 Distribution of the Sample Mean Given A variable with population that is not Normally distributed with mean and standard deviation . A random sample of size n. Assume the population size is at least 20 times n. Result The sample mean has approximate Normal distribution with X X n 18 Distribution of the Sample Mean Given A variable with population that is not Normally distributed with mean and standard deviation . A random sample of size n. Result The sample mean has generally unknown distribution with X X n 19 Distribution Central Limit of the Theorem Sample (CLT) Mean Given A variable with population that is not Normally distributed with mean and standard deviation . A random sample of size n, where n is sufficiently large. Result The sample mean has approximate Normal distribution with X X n 20 What is “Sufficiently Large?” Your book says “generally n at least 30.” If the population is fairly symmetric without outliers, considerably less than 30 will do the trick. If the population is highly skewed, or not unimodal, considerably more than 30 may be required. If the population is Normal then sample size is not a concern: The sample mean is Normal. You may use the “30” rule if you recognize that it’s not that black and white, and that for Normal populations, n = 1 is “sufficiently large.” 21 Example The Census Bureau reports the average age at death for female Americans is 79.7 years, with standard deviation 14.5 years. = 79.7 years = 14.5 years What can we say about the distribution of sample means for samples of size 7? It has mean X 79.7 It has standard deviation X Is the distribution Normal? n 14.5 7 5.48 22 Example Distribution of longevity: 80 15 Within 1 s.d.: 23 Example Distribution of longevity: 80 15 If Normal Within 1 s.d.: (65, 95) 24 Example Distribution of longevity: 80 15 If Normal Within 1 s.d.: (65, 95) 68% 25 Example Distribution of longevity: 80 15 If Normal Within 1 s.d.: (65, 95) Within 2 s.d.s: (50, 110) 68% 95% 26 Example Distribution of longevity: 80 15 If Normal Within 1 s.d.: (65, 95) Within 2 s.d.s: (50, 110) 68% 95% Above 110 27 Example Distribution of longevity: 80 15 If Normal Within 1 s.d.: (65, 95) Within 2 s.d.s: (50, 110) Above 110 68% 95% 2.5% 28 Example Distribution of longevity: 80 15 If Normal Within 1 s.d.: (65, 95) Within 2 s.d.s: (50, 110) Above 110 68% 95% 2.5% 1 in 40 ??? No way! The distribution is not Normal. 29 Example The Normal shouldn’t be used here (why not?) 16 Percent of Women 14 12 10 8 6 4 2 0 30 45 60 75 Age at Death (years) for Women 90 30 Example The Normal shouldn’t be used here (why not?) The distribution of age at death is not Normal. It is quite left skewed. The sample size is not sufficiently large. (At least 30 by your book, although for this situation your instructor would probably buy into as low as 20.) The Central Limit Theorem can’t be applied. The sample mean doesn’t have approximate Normal distribution 31 Example What can we say about the distribution of sample means for samples of size 7? It has mean X 79.7 It has standard deviation X n 14.5 7 5.48 Is the distribution Normal? NO! 32 Example = 79.7 years = 14.5 years I looked at a few recent obituaries in the Oswego Daily News (online): 79 70 48 99 85 71 45 X 71.00 S 19.36 33 Example X 79.7 14.5 X 5.48 n 7 This sample has X 71.0. A difference of 8.7. Can we compute a Z score for 71.0? Should we? Z = (71.0 – 79.7) /5.48 = 8.7/5.48 = –1.59 Why not? This suggests 71.0 (8.7 from 79.7) is somewhat, but not extremely, unusually low. 71.0 is 1.59 standard deviations from 79.7. 34 Example Should we use the Table to obtain probabilities from Z scores (such as our Z = –1.59)? NO If not, how could we get the probability of a result within 8.7 from 79.7? Preferred method: Much Using either a huge database of longevities: more compact; faster to work with; essentially identical results. Simulate many (all possible) samples of size 7. Determine what proportion of samples give a mean at no more than 8.7 from 79.7. a mathematical model for the longevities Either determine the model for sample means using calculus, or approximate it using numerical methods. 35 Example What is the distribution of the sample mean of samples of size n = 48? X 79.7 14.5 X 2.09 n 48 Even though age at death is left skewed, with n = 48 (large enough) the Central Limit Theorem applies, and the sample mean has approximate Normal distribution. 36 Example I looked at 41 more recent obituaries (total of 48) 79 70 87 71 101 89 64 44 75 49 69 91 71 51 48 more 90 93 92 81 50 85 99 85 data 95 51 80 89 92 86 74 68 81 88 81 92 71 45 99 77 78 89 42 93 69 72 92 92 91 X 77.52 S 16.37 37 Example 40 50 60 70 80 90 100 Median Mean Mode 38 95% Confidence Intervals Example Means for samples of 48 US longevities: X 79.7 My sample X 2.09 Normal X 77.52 The sample mean is (79.7 – 77.52) = 2.18 from the population mean. What is the probability that a random sample of 48 U.S. women’s deaths gives a sample mean at within 2.18 of 79.7. 2.18 below 79.7 is 77.52. 2.18 above 79.7 is 81.88 39 Example Below 77.52 or above 81.88. Z = 2.18/2.08 = 1.04 Probability = 0.852 – 0.148 = 0.704 0.7054 77.52 79.7 81.88 Normal, Mean=79.7, StDev=2.08 40 Example Find the probability that a random sample of 48 U.S. women’s deaths gives a sample mean at within 2.18 of 79.7. Probability = 0.704 About 30% (that’s almost 1 in 3) of all samples of 48 deaths give a sample mean more than 2.18 from 79.7. 41 Example Give two explanations that account for the 2.18 year difference between the data on Oswego longevity (which were lower on average) and the U.S. longevity parameter of 79.7. 1. Women in Oswego do not live as long on average as they do nationwide. That is: Oswego< 79.7 42 Example Give two explanations that account for the 2.18 year difference between the data on Oswego longevity (which were lower on average) and the U.S. longevity parameter of 79.7. 2. Sampling variability (sampling “error”): Oswego= 79.7 About 30% of all samples of 48 women yield a mean 2.18 or more from 79.7. That isn’t so uncommon. Our data aren’t very inconsistent with the national result. 43 Sampling Without Replacement What to do if the sample size is more than 5% of the population size… N= population size n = sample size N / n 20 n / N ≤ 0.05 44 Distribution of Sample Means The distribution of the sample mean has > mean X (“unbiased”) N n > standard deviation X N 1 n > shape closer to Normal (but not necessarily Normal) 45 Word Lengths – Gettysburg Address 1 2 3 4 5 6 7 Individual Word Lengths = 268 words: Mean length EachNsymbol represents up to 2 observations. Standard Deviation 8 9 10 11 = 4.295. = 2.123. Not Normal. Right skewed. Can’t use Table A2. 46 Distribution of Sample Means: n = 5 Sample means X from samples of size n = 5 have > mean X 4.295 > standard deviation N n 2.123 268 5 X N 1 268 1 n 5 263 0.9494 0.9494 0.9925 0.942 267 > shape closer to Normal (but not Normal – a bit right skewed) 47 Distribution of Sample Means : n = 5 The standard deviation of this distribution is 0.942. 4.295 The mean of this distribution is 4.295. 1.6 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0 Sample Mean Word Length The shape is close to Normal (but not Normal – there’s right skew). Each symbol represents up to 22 observations. 48 Distribution of Sample Means : n = 10 Sample means X from samples of size n = 10 have > mean X 4.295 > standard deviation N n 2.123 268 10 X N 1 268 1 n 10 258 0.6714 0.6714 0.9830 0.660 267 > shape closer to Normal (but not exactly Normal – a bit right skewed) 49 Distribution of Sample Means : n = 10 The standard deviation of this distribution is 0.660. 4.295 The mean of this distribution is 4.295. 1.6 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0 Sample Mean Word Length The shape is quite close to Normal (just a little Each symbol represents up to 31 observations. 50 right skew – not enough to fuss over). n=5 N n 2.123 268 5 X N 1 268 1 n 5 263 0.9494 0.9494 0.9925 0.942 267 Awful close to 1 n = 10 N n 2.123 268 10 X N 1 268 1 n 10 258 0.6714 0.6714 0.9830 0.660 51 267 n=5 N n 2.123 268 5 X N 1 268 1 n 5 263 0.9494 0.9494 0.9925 0.942 267 Almost the same. n = 10 N n 2.123 268 10 X N 1 268 1 n 10 258 0.6714 0.6714 0.9830 0.660 52 267 n = 100 N n 2.123 268 100 X N 1 268 1 n 100 168 0.2123 0.2123 0.7932 0.1684 267 Not so close to 1 Not almost the same. 53 Distribution of the Sample Mean Given A variable with population that is distributed with mean and standard deviation . A random sample of size n. PARAMETERS Results 1 and 2 STATISTIC The sample mean X has distribution with the same mean and a smaller standard deviation. X N n X N 1 n 54 Distribution of the Sample Mean Given A variable with population that is distributed with mean and standard deviation . A random sample of size n. Results 3 The sample mean X has distribution with a shape that is closer to Normal. X N n X N 1 n 55