Sampling Distributions A review by Hieu Nguyen (03/27/06) Parameter vs Statistic A parameter is a description for the entire population. Example: A parameter for the US population is the proportion of all people who support President Bush’s nomination of Samuel Alito to the Supreme Court. p=.74 Parameter vs Statistic A statistic is a description of a sample taken from the population. It is only an estimate of the population parameter. Example: In a poll of 1001 Americans, 73% of those surveyed supported Alito’s nomination. p-hat=.73 Bias The bias of a statistic is a measure of its difference from the population parameter. A statistic is unbiased if it exactly equals the population parameter. Example: The poll would have been unbiased if 74% of those surveyed approved of Alito’s nomination. p-hat=.74=p Sampling Variability Samples naturally have varying results. The mean or sample proportion of one sample may be different from that of another. In the poll mentioned before p-hat=.73. A repetition of the same poll may have phat=.75. Central Limit Theorem (CLT) Populations that are wildly skewed may cause samples to vary a great deal. However, the CLT states that these samples tend to have a sample proportion (or mean) that is close to the population parameter. The CLT is very similar to the law of large numbers. CLT Example Imagine that many polls of 1001 Americans are done to find the proportion of those who supported Alito’s nomination. Although the poll results vary, more samples have a mean that is close to the population parameter μ=.74. CLT Example Plot the mean of all samples to see the effects of the CLT. Notice how there are more sample means near the population parameter μ=.74. This histogram is actually a sampling distribution Sampling Distributions: Definition Textbook definition: A sampling distribution is the distribution of values taken by the statistic in all possible samples of the same size from the same population. In other words, a sampling distribution is a histogram of the statistics from samples of the same size of a population. Two Most Common Types of Sampling Distributions Sample Proportion Distribution Distribution of the sample proportions of samples from a population Sample Mean Distribution Distribution of the sample means of samples from a population For both types, the ideal shape is a normal distribution Sampling Distributions: Conditions Before assuming that a sampling distribution is normal, check the following conditions: Plausible Independence Randomness Each sample is less than 10% of the population Sampling Distributions As Normal Distributions When all conditions met, the sampling distribution can be considered a normal distribution with a center and a spread. Note: With sample proportion distributions, another condition must be meet: conditon – there must be at least 10 success and 10 failures according to the population parameter and sample size Success-failure Sampling Distributions As Normal Distributions: Equations Sample Proportion Distribution p = population proportion (given) SD pˆ pq n N p, SD pˆ Sample Mean Distribution μ = population mean (given) σ = population standard deviation (given) SD y n N , SD y Sampling Distributions As Normal Distributions: Note Note: If any of the parameters are unknown, use the statistics from a sample to approximate it. Using Sampling Distributions Sampling Distributions can estimate the probability of getting a certain statistic in a random sample. Use z-scores or the NormalCDF function in the TI-83/84. Using Sampling Distributions: Z-Scores w/ Example Use the z-score table to find appropriate probabilities Example: Find the probability that a poll of Americans that support Alito’s nomination will return a sample proportion of .72. pˆ p z SD pˆ P pˆ pˆ OR P pˆ pˆ p .74 pq .74 * .26 SD pˆ .0139 n 1001 pˆ p .72 .74 z 1.443 SD pˆ .0139 P pˆ .72 .0749 Using Sampling Distributions: NormalCDF Function w/ Example The syntax for the NormalCDF function is: NormalCDF(lower limit, upper limit, μ, σ) Example: Find the probability that a sample of size 25 will have a mean of 5 given that the population has a mean of 7 and a standard deviation of 3. 7 3 SD y 3 .6 n 25 NormalCDF (0,5,7,.6) .000429 Sampling Distribution for Two Populations Use a difference sampling distribution if the question presents 2 different populations. x y x y x y x 2 y 2 Sampling Distribution for Two Populations: Example (adapted from AP Statistics – Chapter 9 – Sampling Distribution Multiple Choice Questions Medium oranges have a mean weight of 14oz and a standard deviation of 2oz. Large oranges have a mean weight of 18oz and a standard deviation of 3oz. Find the probability of finding a medium orange that weights more than a large orange. x 14 x 2 y 18 y 3 y x y x 18 14 4 y x y 2 x 2 32 2 2 3.606 NormalCDF (,0,4,3.606) .134 Example Problem (adapted from DeVeau Sampling Distribution Models Exercise #42) Ayrshire cows average 47 pounds if milk a day, with a standard deviation of 6 pounds. For Jersey cows, the mean daily production is 43 pounds, with a standard deviation of 5 pounds. Assume that Normal models describe milk production for these breeds. A) We select an Ayrshire at random. What’s the probability that she averages more than 50 pounds of milk a day? B) What’s the probability that a randomly selected Ayrshire gives more milk than a randomly selected Jersey? C) A farmer has 20 Jerseys. What’s the probability that the average production for this small herd exceeds 45 pounds of milk a day? D) A neighboring farmer has 10 Ayrshires. What’s the probability that his herd average is at least 5 pounds higher than the average for the Jersey herd? Example Problem Solution First, check the assumptions: Independent samples Randomness Sample represents less than 10% of population Example Problem Solution A) Use the normal model to estimate the appropriate probability. 47 6 x 50 47 z .5 P pˆ 50 .309 6 NormalCDF 50, ,47,6 .309 Example Problem Solution B) Create a normal model for the difference between Ayrshires and Jerseys. Use the model to estimate the appropriate probability. a 47 a 6 j 43 j 5 a j a j 47 43 4 a j a 2 j 2 6 2 52 7.810 z x a j a j 04 .512 P x 0 .696 7.810 NormalCDF (0, ,4,7.810) .696 Example Problem Solution C) Create a sampling distribution model for which n=20 Jerseys. Use the model to estimate the appropriate probability. 43 5 n 20 SD y 5 1.118 n 20 x 45 43 z .1.789 P pˆ 45 .0367 1.118 NormalCDF (50, ,47,6) .0367 Example Problem Solution D) First create a sampling distribution model for 10 random Ayrshires and 20 random Jerseys. Then create a normal model for the difference between the 10 Ayrshires and 20 Jerseys. j 43 47 43 4 a j j 5 j a j a 2 j 2 1.897 2 1.1182 2.202 n j 20 j 5 SD y j 1.118 nj 20 a 47 a 6 na 10 SD ya a a na 6 1.897 10 z x a j a j 54 .454 P x 5 .325 2.202 NormalCDF (5, ,4,2.202) .325