QBM117 Business Statistics Statistical Inference Sampling Distribution of the Sample Mean 1 Objectives • To revise the differences between sample statistics and population parameters • To introduce the sampling distribution of a sample statistic. • To understand the central limit theorem. 2 Populations and Samples • A population is the entire collection of items bout which information is desired. • A sample is a subset of the population that we collect data from. 3 Parameters and Statistics • A parameter is number that describes a population. - A parameter is a fixed number. • A statistic is a number that describes a sample. - A statistic is a random variable whose value changes from sample to sample. 4 Statistical Inference • Population parameters are almost always unknown. • We take a random sample from the population of interest and calculate the sample statistic. • We then use the sample statistic as an estimate of the population parameter. • Statistical Inference involves drawing conclusions about a population based on sample information. 5 Example 1 Electronics Associates Industry (EAI) is an international company that manufactures a diverse line of products. The firm’s Director of Personnel has been assigned the task of developing a profile of the company’s 2500 managers. One of the characteristics to be identified is the mean annual salary for the managers. The population is the 2500 managers. The population parameter is the mean annual salary of the 2500 managers. It is unknown. The Director of Personnel does not have the time or the money required to develop a profile for all 2500 managers. He selects a simple random sample of 30 managers and finds that the mean annual salary for the sample is $69616.48. The sample is the 30 managers randomly selected. The sample statistic is the mean annual salary of the 30 managers in the sample, $69616.48. The Director of Personnel then uses the mean annual salary of the sample of 30 managers to estimate the mean annual salary of all 2500 managers. The process of using the mean annual salary of the sample of 30 managers as an estimate of the mean annual salary of all 2500 managers is known as statistical inference. How do we know that the mean annual salary of the sample of 30 managers is a good estimate of the mean annual salary of all 2500 managers? Suppose we select another simple random sample of 30 managers and find that this sample has a mean annual salary of $71374.35.70. The sample mean annual salary will vary from sample to sample. Sampling Distributions • Sample statistics are random variables. • The probability distribution of a sample statistic is called its sampling distribution. • We us the sampling distribution to make inferences about the population parameters. 10 Sampling Distribution of the Sample Mean • One of the most common statistical procedures involves using a sample mean x to make inferences about an unknown population mean . • We expect different samples to have different means. • If we use random sampling, each possible sample of size n has the same probability of being selected. • If we were to take every possible sample of size n from a population and calculate the mean for each sample, we would be able to find the probability distribution of the sample mean. 11 Example 1 To determine the sampling distribution of the sample mean annual salary we would need to calculate the sample mean for every possible sample of size 30. There are C30 2.745832 10 different sample of 30 managers that can be taken from all 2500 managers. 2500 69 This is too many sample means to calculate. 12 We select 200 simple random samples of 30 managers and calculate the sample mean annual salary for each of the 200 samples. Sample Sample Mean Annual Salary 1 $69616.48 2 3 $71374.35 $72034.22 200 $72589.54 13 The histogram of the 200 sample mean annual salaries will give an approximation of the sampling distribution. Histogram of the Sample Mean Annual Salaries of 200 Simple Random Samples of Size 30 60 Frequency 50 40 30 20 10 0 70000 70500 71000 71500 72000 72500 73000 73500 74000 Sample Mean Annual Salary 14 We can calculate the mean and the standard deviation of the sample mean annual salaries for the 200 samples. The mean of the 200 sample mean annual salaries is $71842.13. The standard deviation of the 200 sample mean annual salaries is $680.01. 15 The sampling distribution of the mean annual salary appears to be approximately normal with a mean of $71842.13 and a standard deviation of $680.01. If we were to take all 2.745832 10 possible samples of 30 managers from all 2500 managers we would be able to find the exact sampling distribution. 69 16 Sampling Distribution of the Sample Mean • In practice we only take a single sample from a population and hence use a single sample mean x to make inferences about the population parameter • So how do we find the probability distribution of the sample mean finding that means of all possible samples? • We use some general results. 17 Mean and Standard Deviation of the Sampling Distribution of the Sample Mean • If x is the mean of a random sample of size n from a population with mean and standard deviation , then the mean and standard deviation of the sampling distribution of the sample mean are given by x x • Note that mean. n x is called the standard error of the 18 Example 1 Suppose that information has been obtained from all 2500 managers. The population mean annual salary is The population standard deviation is $71800 $4000 A sample of 30 managers is to be taken. What is the mean and the standard deviation of the sampling distribution of the sample mean annual salary? 19 The mean of the sampling distribution of the sample mean annual salary is x $71800 The standard deviation of the sampling distribution of the sample mean annual salary is x n 4000 30 $730.30 20 Shape of the Sampling Distribution of the Sample Mean • We have described the centre and the spread of the sampling distribution of the sample mean, but what about the shape? • The shape depends on the shape of the population distribution. • If the population is normally distributed, then the sampling distribution of the sample mean is also normally distributed. 21 Example 1 Suppose that the annual salary of all 2500 managers is normally distributed with a mean of $61800 and a standard deviation of $4000. A sample of 30 managers is to be taken. What is the sampling distribution of the mean annual salary of the sample? 22 Let X = the annual salary 2 X ~ N (71800,4000 ) Let X = the mean annual salary of a sample of size 30 We have already determined that the mean of the distribution of X is $61800 and the standard deviation is $730. Since the population from which the sample is being drawn is normally distributed, then the sampling distribution of the sample mean will be normally distributed. 2 X ~ N (61800,730.30 ) 23 Shape of the Sampling Distribution of the Sample Mean • What happens when the population distribution is not normal? • It turns out that as the sample size increases, the distribution of X gets closer to a normal distribution, no matter what the shape the population distribution has. 24 Central Limit Theorem If a random sample is draw from any population, the sampling distribution of the sample mean is approximately normal for a sufficiently large sample size. The larger the sample size, the more closely the sampling distribution of X will resemble a normal distribution. 25 Large Sample Size • How large does a sample need to be to be considered sufficiently large? • Generally a sample size of n 30 is large enough to ensure that the sampling distribution of X is approximately normal. • However if a population is extremely non-normal, the sampling distribution will also be non-normal, even for moderately large values of n 26 Sampling Distribution of the Sample Mean x x n If X is normally distributed, then distributed. X is normally If X is non-normal, then X is approximately normally distributed for sufficiently large sample sizes. 27 Using the Sampling Distribution for Inference • Recall from Topic 2 that if X is normally distributed with mean and standard deviation then Z X has a standard normal distribution with mean 0 and standard deviation 1. 28 • It follows that if X is normally distributed with mean x and standard deviation x / n then Z X x x X / n has a standard normal distribution with mean 0 and standard deviation 1. • Hence we can use the standard normal tables to make inferences about sample means. 29 Example 2 A federal inspector for weights and measures, visits a packaging plant to check that the net weight if packages is as indicated on the packages. The manager assures the inspector that the packaging process results in a mean weight of 750g with a standard deviation of 14g. The inspector selects 100 packages at random and finds their mean weight to be 748.5g. If the managers claim is correct, how likely is a sample mean of 748.g or less? 30 Let X = weight of package 2 The manager claims that X ~ N (750,14 ) The inspector has taken a sample of size 100. Let X = mean weight of a sample of 100 packages Using the Central Limit Theorem we know that X will be approximately normally distributed with a mean and standard deviation of x 750g 14 x 1.4 n 100 31 If the managers claim is correct, how likely is a sample mean of 748.5g or less? We want to find P( X 748.5) 748.5 750 X 32 P ( X 748.5) X x 748.5 750 P 14 / 100 x P ( Z 1.07) 0.5 0.3577 0.1423 1.07 0 Z 33 Example 3 The weight of a ’32g’ chocolate bar is normally distributed with a mean of 32.2g and a standard deviation of 0.3g. a. If a customer buys one chocolate bar, what is the probability that the bar will weigh less than 32g? b. If a customer buys a pack of 4 bars, what is the probability that the mean weight of the 4 bars will be less than 32g? 34 a. X = weight of a chocolate bar X ~ N (32.2,0.32 ) If a customer buys one chocolate bar, the probability that the bar will weight less than 32g: P ( X 32) X 32 32.2 P 0.3 P ( Z 0.67) 0.5 0.2486 0.2514 32 32.2 0.67 0 X Z b. X = mean weight of a sample of 4 chocolate bars We know that therefore X ~ N (32.2,0.32 ) 2 0.3 X ~ N 32.2, 4 We want to find the probability that if a customer buys a pack of 4 chocolate bars, the mean weight of the 4 bars will be less than 32g. Hence we want to find P( X 32) 36 P ( X 32) X x 32 32.2 P 0.3/ 4 x P ( Z 1.33) X 32 32.2 0.5 0.4082 0.0918 1.33 0 Z 37 Compare the distribution of X to the distribution of X X 32 32.2 32 32.2 X 38 Reading for next lecture • Chapter 8, Sections 8.1-8.3 Exercises • • • • 7.9 7.21 7.22 7.23 39