AP Stat Notes – Unit 6: Sampling Distributions The basis for statistical inference is “How often would this method give a correct answer if I used it very many times.” We can use random sampling that involves chance to help us answer this question. This is because the laws of probability answer the question “what would happen if we did this many times?” Think about random sampling as a random phenomenon like rolling dice or flipping a coin. How many times could I roll a pair of dice? Would I need to roll a pair of dice an infinite number of times to get a good idea of what the average sum of the dice would be? Wouldn’t the average sum after say 100 rolls give me a good estimate of the true average sum even though I have only looked at a small slice of the possible rolls? This is the same idea we use in sampling. We can take a random sample of say 100 people from a much larger population and they should give us a good estimate of the true value of the population, even though they only represent a small slice of the whole. Parameter – A number that describes a population. Statistic – A number that describes a sample. Measurement Mean Std Dev Proportion Parameter (or p) Statistic x s p(or p̂ ) A parameter is a fixed number. It is usually unknown because we cannot examine the entire population. A statistic is a number that will be known whenever we take a sample. However it is not fixed and can change from sample to sample. The big question then becomes, “How can I use a number taken from a small sample that will vary for each new sample I choose and use it to estimate a fixed number that represents an entire population?” If the statistic will change with each new sample, how could I possibly say it represents the entire population? This is the idea of sampling variability: the value of the statistic will vary from sample to sample, but if chosen correctly should still be a good representation of the entire population and therefore can be used to estimate the parameter of interest. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. The bias of a statistic The bias of a statistic refers to the center of the sampling distribution. We would like the center of our sampling distribution to be located at the population parameter. A statistic is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. An unbiased statistic will still sometimes fall above the parameter and sometimes below, but it will have no tendency to overestimate or underestimate the statistic. The variability of a statistic The variability of a statistic is described by the spread of its sampling distribution. This spread is determined by the sampling design and the size of the sample. Larger samples give smaller spread. We would like a statistic that is unbiased and has a little variability as possible. That is to say we want a statistic that will be centered at the parameter we are trying to estimate, and we want that statistic to be as close as possible to that center each time. We can think of our sampling distribution as throwing darts at the dartboard. We want our darts to be centered on the bull’s-eye(unbiased) and the all be close together(low variability). The four dart boards below represent each possible situation. Example 8.2, pg 451 Example 8.3, pg 453 x Sampling Distribution of ___ When the objective of a statistical investigation is to make an inference about the population mean , it is natural to consider the sample mean x as an estimate of . Let x denote the mean of the observations in a random sample of size n from a population having mean and standard deviation . The sampling distribution of x (the collection of all sample means from all possible samples of size n from this population) has a mean denoted by x and a standard deviation denoted by x . General Properties of the Sampling Distribution of x : Rule 1: x = . The mean of the sampling distribution is equal to the mean of the population. Rule 2: x = n . This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the pop. is included in the sample. Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n. Rule 4: Central Limit Theorem – When n is sufficiently large, the sampling distribution of x is well approximated by a normal curve, even when the population distribution is not itself normal. The Central Limit Theorem can safely be applied if n exceeds 30. For smaller sample sizes it may still be applied only if the population is believed to be close to normal itself. We can see the application of the Central Limit Theorem below for two different shaped populations. In (a) the population is normal, so no matter what the sample size, the sampling distribution of x is also normal. In (b) the population is strongly skewed to the right. Therefore we need a big enough sample size before we will see a sampling distribution of x that is normal. The Sampling Distribution of a Sample Proportion The objective of some statistical investigations is to draw conclusions about the proportion of individuals or objects in a population that possess a specified property. When this is the case, we use the symbol to denote the proportion of successes in the population. The value of is a number between 0 and 1. Like most parameters, this value is unknown to an investigator. A sample from the population is taken, and the results of that sample are used to estimate the true value of . The statistic that provides a basis for making inferences about is p(or p̂ ), the sample proportion of successes: p = (# of successes in the sample) / n Example 8.7, pg 462 & 8.8 pg 463 General Properties of the Sampling Distribution of p( p̂ ): Rule 1: p . The mean of the sampling distribution of p is equal to the population proportion. Rule 2: p (1 ) n . This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the pop. is included in the sample. Rule 3: When n is large and is not too near 0 or 1, the sampling distribution of p is approximately normal. The further is from 0.5, the larger n must be for a normal approximation to the sampling distribution of p to be accurate. A conservative rule or thumb is: n 5 and n (1 ) 5