What Is a Sampling Distribution? Population Distributions vs. Sampling Distributions There are actually three distinct distributions involved when we sample repeatedly and measure a variable of interest. 1) The population distribution gives the values of the variable for all the individuals in the population. 2) The distribution of sample data shows the values of the variable for all the individuals in the sample. 3) The sampling distribution shows the statistic values from all the possible samples of the same size from the population. Describing Sampling Distributions Center: Biased and unbiased estimators In the chips example, we collected many samples of size 10 and calculated the sample proportion of red chips. How well does the sample proportion estimate the true proportion of red chips, p = 0.5? Note that the center of the approximate sampling distribution is close to 0.5. In fact, if we took ALL possible samples of size 10 and found the mean of those sample proportions, we’d get exactly 0.5. Definition: A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is equal to the true value of the parameter being estimated. What Is a Sampling Distribution? We examine a sampling distribution the same way we analyze any other distribution: CENTER, SHAPE, SPREAD, OUTLIERS. Describing Sampling Distributions To get a trustworthy estimate of an unknown population parameter, start by using a statistic that’s an unbiased estimator. This ensures that you won’t tend to overestimate or underestimate. Unfortunately, using an unbiased estimator doesn’t guarantee that the value of your statistic will be close to the actual parameter value. n=100 n=1000 Larger samples have a clear advantage over smaller samples. They are much more likely to produce an estimate close to the true value of the parameter. What Is a Sampling Distribution? Spread: Low variability is better! Variability of a Statistic The variability of a statistic is described by the spread of its sampling distribution. This spread is determined primarily by the size of the random sample. Larger samples give smaller spread. The spread of the sampling distribution does not depend on the size of the population, as long as the population is at least 10 times larger than the sample. Describing Sampling Distributions We can think of the true value of the population parameter as the bull’s- eye on a target and of the sample statistic as an arrow fired at the target. Both bias and variability describe what happens when we take many shots at the target. Bias means that our aim is off and we consistently miss the bull’s-eye in the same direction. Our sample values do not center on the population value. High variability means that repeated shots are widely scattered on the target. Repeated samples do not give very similar results. The lesson about center and spread is clear: given a choice of statistics to estimate an unknown parameter, choose one with no or low bias and minimum variability. What Is a Sampling Distribution? Bias, variability, and shape Section 7.2 Sampling Distribution of p What’s in Store… Today, we’ll focus on one sampling distribution – the sampling distribution of p̂ . So, we’re going to talk about the center, shape, and spread of the sampling distribution of p̂ Center = mean Spread = standard deviation The Sampling Distribution of P-hat pˆ p pˆ In words, the mean of the sampling distribution of p-hat is p. That makes p-hat an unbiased estimator of p. p(1 p ) n Let’s take a quick look at where these formulas come from. Rules to live by We learned that a sampling distribution is approximately normal IF the sample size is large. How large is large enough? Hmmm… Also, a population should be at least 10 times the size of the sample. Rules of Thumb and the Normal Approximation We can use the normal approximation for p-hat ONLY when np ≥ 10 AND n(1-p) ≥ 10. We can use the formula for the standard deviation of p-hat only when the population is at least 10 times the sample size. In symbols, population ≥ 10n. Example A polling organization asks an SRS of 1500 first-year college students whether they applied for admission to any other college. In fact, 35% of all first-year students applied to colleges besides the one they are attending. What is the probability that the random sample of 1500 students will give a result within 2 percentage points of the true value? State, Plan, Do, Conclude = Parameter, Conditions, Formula, Sentence Parameter p = proportion of 1st year college students who applied to more than 1 college Conditions np≥10 and n(1 – p) ≥10 Population ≥10n Formula I say “normal” you say “Z-score!” Sentence Key Points Always define the population of interest. State the values of n, p, and 1-p. Check BOTH rules of thumb by plugging in values. Graph the distribution you’re interested in. Convert to a Z-score. Make sure you know what the mean and standard deviation are for the problem. State the probability with symbols. Find the probability using Table A. Write your conclusions in words in the context of the problem. Next Example One way of checking undercoverage and nonresponse is to compare the sample with known facts about the population. Suppose 11% of Americans are left-handed. The proportion p-hat of left handed in an SRS of 1500 adults, therefore, should be close to 0.11. If a national survey contains only 9.2% left handed, should we be suspect that the sampling procedure is somehow underrepresenting left handed? To answer this, we will find the probability that a sample of size 1500 contains no more than 9.2% left handed.