Chapter 8 – Sampling Distributions Defn: Sampling error is the error

Chapter 8 – Sampling Distributions Defn: Sampling error is the error resulting from using a sample to infer a population characteristic. Example: We want to estimate the mean amount of Pepsi-Cola in 12-oz. cans coming off an assembly line by choosing a random sample of 16 cans, and using the sample mean as an estimate of the mean for the population of cans. Suppose that we choose 100 random samples of size 16 and compute the sample mean for each of these samples. These 100 values of X will differ from each other somewhat due to sampling error, but the values should all be close to 12oz. Defn: For a random variable X, and a given sample size n, the distribution of the variable X , i.e., of all possible values of X , is called the sampling distribution of the mean. This probability distribution is a set of pairs of numbers. In each pair, the first number is a possible value of the sample mean, and the second number is the probability of obtaining that value of the mean occur when we select a random sample from the population. Properties of the Sampling Distribution of the Mean: 1) For samples of size n, the expectation (mean) of other words, X  X . X , equals the expectation (mean) of X. In 2) The possible values of X cluster closer around the population mean for larger samples than for smaller samples. In other words, the larger the sample size, the smaller the sampling error.  X , will be X   X In particular, we have n In particular, the standard deviation of the sampling distribution of the means, smaller than the population standard deviation, X. , where n is the sample size. Example (Continued): To visualize the concept of the sampling distribution of the mean, let us return to the above example. If we were to measure the amount of Pepsi-Cola in each 12-oz. can, we would obtain a list of values: 𝑥1 , 𝑥2 , 𝑥3 , …. Here, 𝑥1 is the amount of Pepsi-Cola in the first can selected; 𝑥2 is the amount of Pepsi-Cola in the second can selected; etc. If we were to construct a histogram for all of these values, that histogram would represent the population distribution. Assuming that the distribution of fill amounts is normal, the histogram would be a bell-shaped curve centered at 𝜇𝑋 = 12 𝑜𝑧. and with spread given by some (relatively) small amount, such as 𝜎𝑋 = 0.11 𝑜𝑧. Now, instead, let us consider the average fill amount per can for random samples of 16 cans. We select a random sample of 16 cans, measure the amount in each can, and calculate the sample mean. We do this repeatedly, and obtain a list of values: 𝑥̅1 is the mean fill amount for the first sample of 16 cans; 𝑥̅2 is the mean fill amount for the second sample of 16 cans; 𝑥̅3 is the mean fill amount for the third sample of 16 cans; etc. If we were to construct a histogram for all of these values, that histogram would represent the sampling distribution of the mean for samples of size 16. It would be a bell-shaped curve centered at 𝜇𝑋̅ = 𝜇𝑋 = 12 𝑜𝑧. and with spread given by 𝜎 𝜎𝑋̅ = 𝑋 = 0.0275 𝑜𝑧. √16 Defn: The standard deviation of the sampling distribution of the mean is called the standard error of the mean. Property 2 says that for a given population, and a given random variable defined for the members of that population, the standard error of the mean is smaller for larger sample sizes. To visualize what this means, let’s look at an example. Example: Consider the adult population of the United States. We have an IQ test that has been developed to assess the intelligence of individuals in this population. The distribution of IQ scores in the population is normal with mean µ = 100 and standard deviation σ = 15. This means that if we were to measure the IQ of each adult American, and do a histogram of the data, we would get a bell-shaped curve centered at µ = 100 and with standard deviation σ = 15. Now suppose that we select a simple random sample of size n = 4 from the population, administer the IQ test to each person in the sample, and find the sample mean. If we were to do this repeatedly, so that we obtain sample mean values for all possible samples of size n = 4, and do a histogram of those numbers, the histogram would be a bell-shaped curve centered at 𝜇𝑋̅ = 𝜎 𝜇 = 100 and with standard deviation 𝜎𝑋̅ = = 7.5. √4 Suppose that, instead, we consider all possible samples of size n =100 from the population, and for each sample, we obtain the sample mean IQ score. If we do a histogram of all of these numbers, the histogram will be a bell-shaped curve centered at 𝜇𝑋̅ = 𝜇 = 100 and with standard 𝜎 deviation 𝜎𝑋̅ = = 1.5. √100 These three curves are shown together in the graph below: The following theoretical result from probability theory is fundamental for our work in statistical inference. The Central Limit Theorem: For large (n  30) sample sizes, the random variable X has an approximate normal distribution, with mean other words, the random variable Z   X   X and standard deviation  X  X n . In X  X has an approximate standard normal distribution.  X     n This theorem holds regardless of the type of population distribution. The population distribution could be normal; it could be uniform (equally likely outcomes); it could be strongly positively skewed; it could be strongly negatively skewed. Regardless of the shape of the population distribution, the sampling distribution of the mean will be approximately normal. Example: p. 390, Exercise 20 Example: p. 390, Exercise 22 Example: p. 391, Exercise 27 The Sampling Distribution of the Sample Proportion Assume that we have a (large) population, which is divided into two subpopulations. In one subpopulation, each member possesses a certain characteristic; in the other subpopulation, each member does not possess this characteristic. Assume that the proportion of members of the entire population who possess the characteristic is p. We select a simple random sample of size n from the population. We are interested in the proportion of the members of the sample who possess the characteristic of interest. This proportion is called the sample proportion, denoted by 𝑝̂ . The Central Limit Theorem tells us that, if the sample size is “large,” then a) The shape of the sampling distribution of 𝑝̂ is approximately normal, having mean p and 𝑝(1−𝑝) standard deviation √ 𝑛 , provided 𝑛𝑝(1 − 𝑝) ≥ 10. b) Under the same conditions, the distribution of 𝑍= is approximately standard normal. Example: p. 399, Exercise 15. Example: p. 399, Exercise 21 Example: p. 399, Exercise 23 𝑝̂ − 𝑝 √𝑝(1 − 𝑝) 𝑛

Chapter 8 – Sampling Distributions Defn: Sampling error is the error

Related documents

Products

Support

Chapter 8 – Sampling Distributions Defn: Sampling error is the error

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib