Chapter 8 Sampling Variability and Sampling Distributions JellyBlubbers:Episode 4 A New Curve Well, JellyBlubbers have taken control of my backyard, but a solution to the JellyBlubber invasion is in the works! I have determined that Jelly aggression is the result of the skewed nature of the distribution of their lengths. Jellies require balance in all things and as long as they continue to focus on such a skewed distribution it upsets them. So, lets see if we can help the Jellies out! JellyBlubbers:Episode 4 A New Curve 1. Use a random number generator (a table or your calculator) to generate a random sample of 20 JellyBlubbers. 2. Find the Sample Mean length of the set. 3. Graph the sample mean length on the dotplot at the front of the room. 4. Repeat the process with a new random sample. If we wanted to do this for every possible sample of size 20, how many sample mean lengths would we have to find? JellyBlubbers:Episode 4 A New Curve Does it appear that this new distribution the Jellies will focus on, the sampling distribution of the sample means, will allow us to achieve the peace and balance in my backyard? Why? BASIC TERMS •Statistic: •Any quantity computed from values in a sample. •Sampling variability: •The observed value of a statistic depends on the particular sample selected from the population; Typically, it varies from sample to sample. Sample This! Q: How many possible different samples of 5 m&ms are in a bag of 500 m&ms? A: Too many…Not really. There are 500C5 or 255,244,687,600 possible samples of 5 M&Ms. This is called the Population of Samples. BASIC TERMS, cont. •Sampling Distribution: The distribution of a statistic. If the population of samples is relatively small, a sampling distribution can be displayed in a table just like any other probability distribution! Sampling Distribution of Sample Means: An Example… MHS has the following senior football starters and their weights in pounds: Aaron-220, Brad-200, Chris-170, Derek-180, Eric-190, Frank-210, and George-160. Suppose this is the population we are interested in. The mean and standard deviation of this population are: = 220 200 170 180 190 210 160 190lbs 7 = 20 To create a sampling distribution of sample means from every combination of two players, let’s create the population of samples. Sample Sample Mean Sample AB 210 BD AC 195 BE AD 200 BF AE 205 BG AF 215 CD AG 190 CE BC 185 CF Sample Mean 190 195 205 180 175 180 190 Sample Sample Mean CG 165 DE 185 DF 195 DG 170 EF 200 EG 175 FG 185 Create the sampling distribution as a probability distribution then create the dotplot for the original data and for the sample means. Compare them. Results!!! • Original data • Sample means = 190 = 20 = 190 = 12.91 Graph Graph Now create the population OF SAMPLES for samples of size 3! Create the sampling distribution and the dotplot. Sample Sample Mean Sample Sample Mean Sample Sample Mean ABC AFG CEF ABD BCD CEG ABE BCE CFG ABF BCF DEF ABG BCG DEG ACD BDE DFG ACE BDF EFG ACF BDG ACG BEF ADE BEG ADF BFG ADG CDE AEF CDF AEG CDG What Do You Notice About the Sampling Distributions of Sample Means As the Sample Size Increases From the Parent Population? • What type of distribution is the parent population? • What is the mean (center) of the parent population? What Do You Notice About the Sampling Distributions of Sample Means As the Sample Size Increases From the Parent Population? • What is the standard deviation of the parent population? • What shape (type of distribution) is the sampling distribution of sample means of sample size 2? What Do You Notice About the Sampling Distributions of Sample Means As the Sample Size Increases From the Parent Population? • What shape (type of distribution) is the sampling distribution of sample means of sample size 3? • Remind me. What was the center (mean) value of the sampling distribution of sample means of sample size 2? What Do You Notice About the Sampling Distributions of Sample Means As the Sample Size Increases From the Parent Population? • What appears to be the center (mean) value of the sampling distribution of sample means of sample size 3? • The standard deviation of the sampling distribution of sample means of sample size 2 is 13.2288 and for sample size 3 is 9.5665. How does this compare to the parent population standard deviation of 20? Example 2… •Consider a very large population that consists of the numbers 1, 2, 3, 4 and 5 generated in a manner that the probability of each of those values is 0.2 no matter what the previous selections were. This population could be described as the outcome associated with a spinner such as given below with the distribution next to it. x 1 2 3 4 5 p(x) 0.2 0.2 0.2 0.2 0.2 Example 2 •If the sampling distribution for the means of samples of size two is analyzed, it looks like….. Population of Samples Sample 1, 1 1, 2 1, 3 1, 4 1, 5 2, 1 2, 2 2, 3 2, 4 2, 5 3, 1 3, 2 3, 3 1 1.5 2 2.5 3 1.5 2 2.5 3 3.5 2 2.5 3 Sample 3, 4 3, 5 4, 1 4, 2 4, 3 4, 4 4, 5 5, 1 5, 2 5, 3 5, 4 5, 5 Sampling Distribution 3.5 4 2.5 3 3.5 4 4.5 3 3.5 4 4.5 5 1 1.5 2 2.5 3 3.5 4 4.5 5 frequency 1 2 3 4 5 4 3 2 1 25 p(x) 0.04 0.08 0.12 0.16 0.20 0.16 0.12 0.08 0.04 Example 2 • The original distribution and the sampling distribution of means of samples with n=2 are given below. 1 2 3 4 5 Original distribution 1 2 3 4 5 Sampling distribution n=2 Example 2 • Sampling distributions for n=3 and n=4 were calculated and are illustrated below. 1 2 3 4 5 1 2 1 2 3 4 5 Original distribution Sampling distribution n = 2 1 2 3 4 5 3 4 5 Sampling distribution Sampling distribution n=3 n=4 Simulations To illustrate the general behavior of samples of fixed size n, 10000 samples each of size 30, 2 60 and 120 were generated from this uniform distribution and the means calculated. Probability histograms were created for each of these (simulated) sampling distributions. 2 Notice all three of these look to be essentially normally distributed. Also the mean of each is 3 and the variability decreases as the sample size increases. 2 3 4 3 4 Means (n=30) Means (n=60) 3 Means (n=120) 4 Simulations To further illustrate the general behavior of samples of fixed size n, 10000 samples each of size 4, 16 and 30 were generated from the positively skewed distribution pictured below. Skewed distribution Notice that these sampling distributions all all skewed, but as n increased the sampling distributions became more symmetric and eventually appeared to be almost normally distributed. Terminology The mean of the distribution of sample means μ x The standard deviation of the distribution of sample means σ x Properties of the Sampling Distribution of the Sample Mean. Rule 1: x Rule 2: x n This rule is approximately correct as long as no more than 5%*of the population is included in the sample. Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n. *10% depending on your text of choice. Its just like the Binomial Distribution rule for sampling without replacement. Central Limit Theorem. Rule 4: When n is sufficiently large, the sampling distribution of is approximately normally distributed, even when the population distribution is not itself normal. x Illustrations of Sampling Distributions Population n= 4 n=9 n = 25 Symmetric normal-like population Illustrations of Sampling Distributions Population n=4 n=10 n=30 Skewed population More About the Central Limit Theorem. The Central Limit Theorem can safely be applied when n exceeds 30. If n is large or the population distribution is normal, the standardized variable, z, has (approximately) a standard normal (z) distribution. x X x z X n Examples Example 1: Non-CLT problem The average number of detention hours assigned per offender at MHS is 5 hours with a standard deviation of 1.5 hours. If an offender is selected at random what is the probability he/she served no more than 7 hours of detention? = 5, = 1.5 75 p(x 7) p z .908 1.5 CLT Same Setup: What is the probability that a random sample of 30 offenders will have served an average of at most 7 hours? Notice now we are talking about an average from a sample of items not an individual item. This is CLT. 1.5 x 5, x .274 n 30 75 p ( x 7) p z 1 .274 Example A food company sells “18 ounce” boxes of cereal. Let x denote the actual amount of cereal in a box of cereal. Suppose that x is normally distributed with = 18.03 ounces and = 0.05. a) What proportion of the boxes will contain less than 18 ounces? 18 18.03 P(x 18) P z 0.05 P(z 0.60) 0.2743 Example - Continued b) A case consists of 24 boxes of cereal. What is the probability that the mean amount of cereal (per box in a case) is less than 18 ounces? The central limit theorem states that the distribution of x is normal so… 18 18.03 P(x 18) P z 0.05 24 P(z 2.94) 0.0016 Some Proportion Distributions Where P = 0.2 Let p̂ be the proportion of successes in a random sample of size n from a population whose proportion of S’s (successes) is p.* n = 10 np 10, n1 p 10 n = 20 n = 50 n = 100 0.2 0.2 0.2 0.2 * Or depending on your textbook of choice. Properties of the Sampling Distribution of p̂ Let p̂ be the proportion of successes in a random sample of size n from a population whose proportion of S’s (successes) is p. Denote the mean of by mu and the standard deviation by sigma . Then the following rules hold p̂ Properties of the Sampling Distribution of p̂ Rule 1: p̂ p Rule 2: pˆ p 1 p n Rule 3: When n is large and p is not too near 0 or 1, the sampling distribution of p-hat is approximately normal. (CLT for proportions.) Rule of thumb: If n·p10 and n(1-p)10 the distribution is approximately normal Condition for Use The further the value of p is from 0.5, the larger n must be for a normal approximation to the sampling distribution of p̂ to be accurate. Rule of Thumb: Remember!!!!!! If both np 10 and n(1-p) 10, then it is safe to use a normal approximation. Example If the true proportion of defectives produced by a certain manufacturing process is 0.08 and a sample of 400 is chosen, what is the probability that the proportion of defectives in the sample is greater than 0.10? Since np = 400(0.08) = 32 > 10 and n(1-p) = 400(0.92) = 368 > 10, it’s reasonable to use the normal approximation. Example (Continued) p 0.08 p (1 ) 0.08(1 0.08) 0.013565 n 400 z p p 0.10 0.08 1.47 p 0.013565 P(p 0.1) P(z 1.47) 1 0.9292 0.0708 Example Suppose 3% of the people contacted by phone are receptive to a certain sales pitch and buy your product. If your sales staff contacts 2000 people, what is the probability that more than 5% of the people contacted will purchase your product? Clearly p = 0.03 and p = 100/2000 = 0.05 0.05 0.03 so P(p 0.05) P z (0.03)(0.97) 2000 0.05 0.03 P z P(z 5.24) 0 0.0038145 Example - Continued If your sales staff contacts 2000 people, what is the probability that less than 2.5% of the people contacted will purchase your product? Now p = 0.03 and p = 50/2000 = 0.025 so 0.025 0.03 P(p 0.025) P z (0.03)(0.97) 2000 0.025 0.03 P z P(z 1.31) 0.0951 0.0038145 x , x n Review • Central limit theorem for • the sample means • If the sample size is sufficiently large, then the • sampling distribution of the sample means is approximately normal regardless of the shape of the the parent distribution. • Rule of thumb: it is generally safe to apply the CLT for means when n30. pˆ p, p̂ p(1 p) n “Central limit theorem” for the sample proportions The sampling distribution of sample proportions is approximately normal when np10 and n(1-p)10