Chapter 7: Sampling Distributions 7.1 Generating Sampling Distributions Objectives To understand what a sampling distribution is. To generate exact and simulated sampling distributions. To know the shape, centers, and spreads of typical sampling distributions. To understand the properties of point estimators. Understanding the concept of a sampling distribution is very important as it underpins the entire study of inference in the remaining chapters. Reminder: A parameter is a number that describes the population. In statistical practice, the value of a parameter is not known because we cannot normally examine the entire population. e.g. , , , etc. (Greek letters) A statistic is a number that can be computed from sample data, without using any of the unknown parameters. e.g. x , s , r , etc. (English letters) The sampling distribution of a statistic is the distribution of potential values arising from all possible random samples of the same size for that statistic. Another important thing to understand is the difference between the sampling distribution of a summary statistic and a simulated sampling distribution. For discrete populations the sampling distribution is the ideal pattern that would emerge if you looked at all possible samples of size n from a population. You can sometimes create an exact sampling distribution by listing all possible samples. However, because the number of possible samples can be extremely large, we are rarely able to list them to construct the exact sampling distribution. Instead we rely on theory or simulation to generate an approximate sampling distribution. You can generate a simulated sampling distribution of any statistic by following these steps: 1. 2. 3. 4. Take a random sample of size n from a population. Compute a summary statistic. Repeat steps 1 and 2 many times. Display the distribution of the summary statistics. Note: Both exact and simulated sampling distributions have appeared on the AP Exam so you should have experience at constructing both. 1 7.1 Generating Sampling Distributions Properties of Point Estimators Sampling distributions are the connecting link between the collection of data (through sampling or experiments) and statistical inference (the process of drawing conclusions from the data). A point estimate is a single value (summary statistic) calculated from sample data. It serves as the “best guess” for the unknown population parameter. There are two properties that you would like the summary statistic to have: It should be unbiased. This means: mean of the sampling distribution = parameter being estimated Or in other words, a summary statistic is a biased estimator of a population parameter if it gives results that are too large or too small on average. This idea is exactly parallel to the idea of bias in a sample survey. The method of sampling is biased if, on average, the method produces a summary statistic that is too small or too large. Note: bias is a property of the method, not of an individual sample. It should have as little variability as possible and should have a standard error that decreases as the sample size increases. Example: P4 page 419 1. Every year, Forbes magazine releases a list of the top-earning dead celebrities. In 2005, the top six and their yearly earnings were a. Elvis Presley, $45 million b. Charles M. Schulz, $35 million c. John Lennon, $22 million d. Andy Warhol, $16 million e. Theodor “Dr. Seuss” Giesel, $10 million f. Marlon Brando, $9 million Your talent agency gets an opportunity to represent two of these dead celebrities, to be selected at random. You will be paid 10% of their earnings. (a) What is the most you could be paid? (b) Construct the sampling distribution of your total possible earnings. (c) What is the probability that you will be paid $3 million or more? 2 7.1 Generating Sampling Distributions The least?