Chapter 8 Section 1 Distributions of the Sample Mean Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 1 of 29 Chapter 8 – Section 1 ● Learning objectives 1 Understand the concept of a sampling distribution 2 Describe the distribution of the sample mean for samples obtained from normal populations 3 Describe the distribution of the sample mean for samples obtained from a population that is not normal Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 2 of 29 Chapter 8 – Section 1 ● Learning objectives 1 Understand the concept of a sampling distribution 2 Describe the distribution of the sample mean for samples obtained from normal populations 3 Describe the distribution of the sample mean for samples obtained from a population that is not normal Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 3 of 29 Chapter 8 – Section 1 ● Often the population is too large to perform a census … so we take a sample ● How do the results of the sample apply to the population? What’s the relationship between the sample mean and the population mean? mean What’s the relationship between the sample standard deviation and the population standard deviation? ● This is statistical inference Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 4 of 29 Chapter 8 – Section 1 ● We want to use the sample mean x to estimate the population mean μ ● If we want to estimate the heights of eight year old girls, we can proceed as follows Randomly select 100 eight year old girls Compute the sample mean of the 100 heights Use that as our estimate ● This is using the sample mean to estimate the population mean Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 5 of 29 Chapter 8 – Section 1 ● However, if we take a series of different random samples Sample 1 – we compute sample mean x1 Sample 2 – we compute sample mean x2 Sample 3 – we compute sample mean x3 Etc. ● Each time we sample, we may get a different result ● The sample mean x is a random variable! Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 6 of 29 Chapter 8 – Section 1 ● Because the sample mean is a random variable The sample mean has a mean The sample mean has a standard deviation The sample mean has a probability distribution ● This is called the sampling distribution of the sample mean Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 7 of 29 Chapter 8 – Section 1 ● When we use the sample mean to estimate the population mean, we are estimating a parameter (number) with a random variable ● The sampling distribution of the sample mean has the parameters of A sample of size n A population with mean μ A population with standard deviation σ Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 8 of 29 Chapter 8 – Section 1 ● Example ● We have the data 1, 7, 11, 12, 17, 17, 17, 21, 21, 21, 22, 22 and we want to take samples of size n = 3 ● First, a histogram of the entire data set Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 9 of 29 Chapter 8 – Section 1 ● A histogram of the entire data set ● Definitely skewed left … not bell shaped Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 10 of 29 Chapter 8 – Section 1 ● Taking some samples of size 3 ● The first sample, 17, 21, 12, has a mean of 16.7 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 11 of 29 Chapter 8 – Section 1 ● More sample means from more samples Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 12 of 29 Chapter 8 – Section 1 ● A histogram of 20 sample means Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 13 of 29 Chapter 8 – Section 1 ● The original data set was skewed left, but the set of sample means is close to bell shaped Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 14 of 29 Chapter 8 – Section 1 ● Learning objectives 1 Understand the concept of a sampling distribution 2 Describe the distribution of the sample mean for samples obtained from normal populations 3 Describe the distribution of the sample mean for samples obtained from a population that is not normal Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 15 of 29 Chapter 8 – Section 1 ● If we know that the population has a normal distribution, then the sampling distribution (i.e. the distribution of x) will also be normal ● In fact, the sampling distribution Will be normally distributed Will have a mean equal to the mean of the population Will have a standard deviation less than the standard deviation of the population Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 16 of 29 Chapter 8 – Section 1 ● Why does it have a smaller standard deviation? ● The population standard deviation Is a measure of the distance between an individual value and the mean The sampling error for a sample of size n = 1 ● The standard deviation of the sample mean Is a measure of the distance between the sample mean and the mean ● It makes sense that the estimate is more accurate if we’ve taken more values (a larger n) Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 17 of 29 Chapter 8 – Section 1 ● The Law of Large Numbers says that As we take more observations to the sample (i.e. as n gets larger), the difference between the sample mean x and the population mean μ approaches 0 ● That is to say, we get to the right answer with larger and larger sample sizes Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 18 of 29 Chapter 8 – Section 1 ● The standard error, x , is the standard deviation of the sample mean ● The formula for x is x n ● This is an extremely important formula Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 19 of 29 Chapter 8 – Section 1 ● What does this mean? ● If we have a normally distributed random variable X, then the distribution of the sample mean x is completely determined We know that it’s also normally distributed We know its mean We know its standard deviation ● So … we can do all of our calculations! Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 20 of 29 Chapter 8 – Section 1 ● If a simple random sample of size n is drawn from a large population, then the sampling distribution has Mean x and Standard deviation x n ● In addition, if the population is normally distributed, then The sampling distribution is normally distributed Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 21 of 29 Chapter 8 – Section 1 ● Example ● If the random variable X has a normal distribution with a mean of 20 and a standard deviation of 12 If we choose samples of size n = 4, then the sample mean will have a normal distribution with a mean of 20 and a standard deviation of 6 If we choose samples of size n = 9, then the sample mean will have a normal distribution with a mean of 20 and a standard deviation of 4 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 22 of 29 Chapter 8 – Section 1 ● Learning objectives 1 Understand the concept of a sampling distribution 2 Describe the distribution of the sample mean for samples obtained from normal populations 3 Describe the distribution of the sample mean for samples obtained from a population that is not normal Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 23 of 29 Chapter 8 – Section 1 ● This is great if our random variable X has a normal distribution ● However … what if X does not have a normal distribution ● What can we do? Wouldn’t it be very nice if the sampling distribution for X also was normal? This is almost true … Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 24 of 29 Chapter 8 – Section 1 ● The Central Limit Theorem states Regardless of the shape of the distribution, the sampling distribution becomes approximately normal as the sample size n increases ● Thus If the random variable X is normally distributed, then the sampling distribution is normally distributed also For all other random variables X, the sampling distributions are approximately normally distributed Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 25 of 29 Chapter 8 – Section 1 ● This approximation, of the sampling distribution being normal, is good for large sample sizes … large values of n ● How large does n have to be? ● A rule of thumb – if n is 30 or higher, this approximation is probably pretty good Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 26 of 29 Chapter 8 – Section 1 ● Example ● We’ve been told that the average weight of giraffes is 2400 pounds with a standard deviation of 300 pounds ● We’ve measured 50 giraffes and found that the sample mean was 2600 pounds ● Is our data consistent with what we’ve been told? Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 27 of 29 Chapter 8 – Section 1 ● The sample mean is approximately normal with mean 2400 (the same as the population) and a standard deviation of 300 / √ 50 = 42.4 ● Using our calculations for the general normal distribution, 2600 is 200 pounds over 2400, and 200 pounds is 200 / 42.4 = 4.7 ● From our normal calculator, there is about a 1 chance in 1 million of this occurring ● Something is definitely strange … we’ll see what to do later in inferential statistics Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 28 of 29 Summary: Chapter 8 – Section 1 ● The sample mean is a random variable with a distribution called the sampling distribution If the sample size n is sufficiently large (30 or more is a good rule of thumb), then this distribution is approximately normal The mean of the sampling distribution is equal to the mean of the population The standard deviation of the sampling distribution is equal to / n Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 29 of 29 Chapter 8 – Example 1 ● The combined (verbal + quantitative reasoning) score on the GRE is normally distributed with mean 1066 and standard deviation 191. (Source: www.ets.org/Media/Tests/GRE/pdf/01210.pdf.) Suppose n = 15 randomly selected students take the GRE on the same day. a. What is the probability that a randomly selected student scores above 1100 on the GRE? b. Describe the sampling distribution of the sample mean. c. What is the probability that a random sample of 15 students has a mean GRE score that is less than 1100? d. What is the probability that a random sample of 15 students has a mean GRE score that is 1100 or above? Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 30 of 29 Chapter 8 – Example 1 ● The combined (verbal + quantitative reasoning) score on the GRE is normally distributed with mean 1066 and standard deviation 191. (Source: www.ets.org/Media/Tests/GRE/pdf/01210.pdf.) Suppose n = 15 randomly selected students take the GRE on the same day. a. What is the probability that a randomly selected student scores above 1100 on the GRE? (0.4286) b. Describe the sampling distribution of the sample mean. (It is normal with mean 1066 and standard deviation 49.3) c. What is the probability that a random sample of 15 students has a mean GRE score that is less than 1100? (0.7549) d. What is the probability that a random sample of 15 students has a mean GRE score that is 1100 or above? (0.2451) Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 31 of 29 Chapter 8 – Example 2 ● In the United States, the year each coin was minted is printed on the coin. To find the age of a coin, simply subtract the current year from the year printed on the coin. The ages of circulating pennies are right skewed. Assume the ages of circulating pennies have a mean of 12.2 years and a standard deviation of 9.9 years. a. Based on the information given, can we determine the probability that a randomly selected penny is over 10 years old? b. What is the probability that a random sample of 40 circulating pennies has a mean less than 10 years? c. What is the probability that a random sample of 40 circulating pennies has a mean greater than 10 years? d. What is the probability that a random sample of 40 circulating pennies has a mean greater than 15 years? Would this be unusual? Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 32 of 29 Chapter 8 – Example 2 ● In the United States, the year each coin was minted is printed on the coin. To find the age of a coin, simply subtract the current year from the year printed on the coin. The ages of circulating pennies are right skewed. Assume the ages of circulating pennies have a mean of 12.2 years and a standard deviation of 9.9 years. a. Based on the information given, can we determine the probability that a randomly selected penny is over 10 years old? (No, because the population of the ages of circulating pennies is not normally distributed.) b. What is the probability that a random sample of 40 circulating pennies has a mean less than 10 years? (0.0793) c. What is the probability that a random sample of 40 circulating pennies has a mean greater than 10 years? (0.9207) d. What is the probability that a random sample of 40 circulating pennies has a mean greater than 15 years? Would this be unusual? (0.0367; yes) Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 8 Section 1 – Slide 33 of 29